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INTRODUCTION 



M Field of the Invention 

A The field of this invention is molecular biology, particularly recombinant 

W 20 DNA engineering. 

Background of the Invention 

The processes of isolating, cloning and expressing genes are central to the 
field of molecular biology and play prominent roles in research and industry in 
biotechnology and related fields. Until recently, the isolation and cloning of genes 
25 has been achieved in vitro using restriction endonucleases and DNA ligases. 
Restriction endonucleases are enzymes which recognize and cleave double- 
stranded DNA at a specific nucleotide sequence, and DNA ligases are enzymes 
which join fragments of DNA together via the phosphodiester bond. A DNA 
sequence of interest can be "cut" or digested into manageable pieces using a 
30 restriction endonuclease and then inserted into an appropriate vector for cloning 
using DNA ligase. However, in order to transfer the DNA of interest into a 
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different vector-most often a specialized expression vector-restriction enzymes 
must be used again to excise the DNA of interest from the cloning vector, and 
then DNA ligase is used again to ligate the DNA of interest into the chosen 
expression vector. 

5 The ability to transfer a DNA of interest to an appropriate expression vector 

is often limited by the availability or suitability of restriction enzyme recognition 
sites. Often multiple restriction enzymes must be employed to remove the 
desired coding region. Further, the reaction conditions used for each enzyme 
may differ such that it is necessary to perform the excision reaction in separate 

10 steps, or it may be necessary to remove a particular enzyme used in an initial 
restriction enzyme reaction prior to completing subsequent restriction enzyme 
digestions due to buffer and/or cofactor incompatibility. Many of these extra steps 
require time-consuming purification of the subcloning intermediate. 

There is, therefore, a need to develop protocols and compositions for the 

15 rapid transfer of a DNA molecule of interest from one vector to another in vitro or 
in vivo without the need to rely upon restriction enzyme digestions. To address 
this need, a number of different sequence specific recombinase based methods 
have been developed which allow one to transfer sequence material among 
vectors without restriction enzyme digestions. These systems include the 

20 commercially available Creator and Gateway sequence specific recombinase 
based methods, where representative systems are described in U.S. Patent Nos. 
5,581,808 and 5,888,732; as well as in Published PCT Application Serial Nos. 
WO 00/12687 and WO 01/05961. 

While the above protocols and systems are effective, there is room for 

25 improvement. For example, in the above systems, expression vectors that are 
produced by the methods encode fusion proteins of the gene of interest fused to 
a sequence encoded by the sequence specific recombinase site of the vector. In 
many instances, such a fusion sequence is undesirable. 

As such, there is continued interest in the improvement of these sequence 

30 specific recombinase systems. Of particular interest would be the development of 
such a system that produced expression vectors where the protein of interest was 
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not expressed a fusion with sequence specific recombinase encoded sequences. 
The present invention satisfies this interest. 

Relevant Literature 

References of interest include: U.S. Patent Nos. 5,527,695; 5,744,336; 
5,851,808; 5,888,732; and 5,962,255; as well as in Published PCT Application 
Serial Nos. WO 00/12687 and WO 01/05961 . Also of interest is: Kaartinen & 
Nagy, Genesis (2001) 31: 126-129; and Yoshimura et al., Mol. Urol. (2001) 5: 81- 
4. 

SUMMARY OF THE INVENTION 
Methods are provided for producing a vector that includes at least one 
splicable intron. In the subject methods, intron containing vectors are produced 
from donor and acceptor vectors that each include a sequence specific 
recombinase site, where the subject donor and acceptor vectors further include 
splice donor and acceptor sites that, upon sequence specific recombination of the 
donor and acceptor vectors, define an intron in the product vector of the 
recombination step. Also provided are compositions for use in practicing the 
subject methods, including the donor and acceptor vectors themselves, as well as 
systems and kits that include the same. The subject invention finds use in a 
variety of different applications, including the production of expression vectors 
that encode C-terminal tagged fusion proteins, the production of expression 
vectors that encode pure protein and not a fusion thereof with N- and/or C- 
terminal sequence specific recombinase site encoded residues, and the like. 

BRIEF DESCRIPTION OF THE FIGURES 
Figure 1 provides a map of the pDNR-Dual donor vector described in 

greater detail below. 

Figure 2 provides a map of the pLPS-EGFP acceptor vector described in 

greater detail below. 
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Figure 3 provides a mape of the pDNR-Dual-Luc vector described in 
greater detail below. 

Figure 4 provides a map of the pLPS-Luc-EGFP vector described in 
greater detail below. 

Figure 5 provides a flow diagram of a representative method according to 
the subject invention. 



DEFINITIONS 

10 The terms "sequence-specific recombinase" and "site-specific 

recombinase" refer to enzymes or recombinases that recognize and bind to a 

□ short nucleic acid site or "sequence-specific recombinase target site", i.e., a 

Q 



recombinase recognition site, and catalyze the recombination of nucleic acid in 
relation to these sites. These enzymes include recombinases, transposases and 
'*£) 15 integrases. 

^ The terms "sequence-specific recombinase target site", "site-specific 

p recombinase target site", "sequence-specific target site 1 ' and "site-specific target 

u site" refer to short nucleic acid sites or sequences, i.e., recombinase recognition 

sites, which are recognized by a sequence- or site-specific recombinase and 
II 20 which become the crossover regions during a site-specific recombination event. 
Examples of sequence-specific recombinase target sites include, but are not 
limited to, lox sites, att sites, dif sites and frt sites. 

The term "lox site" as used herein refers to a nucleotide sequence at which 
the product of the ere gene of bacteriophage P1, the Cre recombinase, can 
25 catalyze a site-specific recombination event. A variety of lox sites are known in 
the art, including the naturally occurring loxP, loxB, loxL and loxR, as well as a 
number of mutant, or variant, lox sites, such as loxP51 1 , loxP514, loxA86, 
loxA117, loxC2, loxP2, loxP3 and lox P23. 

The term "frt site" as used herein refers to a nucleotide sequence at which 
30 the product of the FLP gene of the yeast 2 micron plasmid, FLP recombinase, 
can catalyze site-specific recombination. 
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The term "unique restriction enzyme site" indicates that the recognition 
sequence of a given restriction enzyme appears once within a nucleic acid 
molecule. 

A restriction enzyme site or restriction site is said to be located "adjacent to 
the 3' end of a sequence-specific recombinase target site" if the restriction 
enzyme recognition site is located downstream of the 3' end of the sequence- 
specific recombinase target site. The adjacent restriction enzyme site may, but 
need not, be contiguous with the last or 3' most nucleotide comprising the 
sequence-specific recombinase target site. 

The term "intron" as used herein refers to a domain of a vector produced 
by the subject methods that is flanked on the 5' end by a splice donor site and on 
the 3' end by a splice acceptor site, where under appropriate conditions the intron 
is spliced out of or removed from an mRNA sequence expressed from the vector 
in which it is present. 

The term "splice donor site" as used herein refers to a sequence or domain 
of a nucleic acid present at the 5' end of an intron, as defined above, that marks 
the start of the intron and its boundary with the preceding coding sequence - 
exon. 

The term "splice acceptor site" as used herein refers to a sequence or 
domain of a nucleic acid present at the 3 f end of an intron, as defined above, that 
marks the start of the intron and its boundary with the following coding sequence 
-exon.. In the present invention, the splice acceptor site is also meant to include 
the intron Branch point, which is required together with the splice donor and 
splice acceptor sequence in order for splicing to occur. The branch point marks 
the point to which the 5'end of the intron becomes joined during the process of 
splicing. For convenience, in the present embodiments, the splice Acceptor 
sequence and the Branch site are placed adjacent to each other so that they can 
be encoded within a single synthetic oligonucleotide for ease of vector 
construction. Thus, they are described here as a single unit. However, they may 
be further separated, by moving the branch site further 5' of the splice acceptor 
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sequence, provided that it is not moved 5' of the splice donor sequence and 
provided that splicing efficiency is not hindered. 

The Term "splice site" as used herein refers to a sequence or domain of a 
nucleic acid present at either the 5' end or the 3' end of an intron as defined 
5 above. 

The terms "polylinker" or "multiple cloning site" refer to a cluster of 
restriction enzyme sites, typically unique sites, on a nucleic acid construct that 
can be utilized for the insertion and/or excision of nucleic acid sequences, such 
as the coding region of a gene, loxP sites, etc. 

10 The term "termination sequence" refers to a nucleic acid sequence which is 

recognized by the polymerase of a host cell and results in the termination of 
transcription. Prokaryotic termination sequences commonly comprise a GC-rich 
region that has a two-fold symmetry followed by an AT-rich sequence. A 
commonly used termination sequence is the T7 termination sequence. A variety 

15 of termination sequences are known in the art and may be employed in the 

nucleic acid constructs of the present invention, including the TINT3, TL13, TL2, 
TR1, TR2, and T6S termination signals derived from the bacteriophage lambda, 
and termination signals derived from bacterial genes, such as the trp gene of E. 
coli. 

20 The terms "polyadenylation sequence" (also referred to as a "poly A + site" 

or "poly A + sequence") as used herein denotes a DNA sequence which directs 
both the termination and polyadenylation of the nascent RNA transcript. Efficient 
polyadenylation of the recombinant transcript is desirable, as transcripts lacking a 
poly A + tail are typically unstable and rapidly degraded. The poly A + signal 

25 utilized in an expression vector may be "heterologous" or "endogenous". An 
endogenous poly A + signal is one that is found naturally at the 3' end of the 
coding region of a given gene in the genome. A heterologous poly A + signal is 
one which is isolated from one gene and placed 3' of another gene, e.g., coding 
sequence for a protein. A commonly used heterologous poly A + signal is the 

30 SV40 poly A + signal. The SV40 poly A + signal is contained on a 237 bp 

BamH\/Bcl\ restriction fragment and directs both termination and polyadenylation; 
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numerous vectors contain the SV40 poly A + signal. Another commonly used 
heterologous poly A + signal is derived from the bovine growth hormone (BGH) 
gene; the BGH poly A + signal is also available on a number of commercially 
available vectors. The poly A + signal from the Herpes simplex virus thymidine 
5 kinase (HSV tk) gene is also used as a poly A + signal on a number of commercial 
expression vectors. 

As used herein, the terms "selectable marker" or "selectable marker gene" 
refer to a gene which encodes an enzymatic activity and confers the ability to 
grow in medium lacking what would otherwise be an essential nutrient; in 
10 addition, a selectable marker may confer upon the cell in which the selectable 
marker is expressed, resistance to an antibiotic or drug. A selectable marker may 
be used to confer a particular phenotype upon a host cell. When a host cell must 
express a selectable marker to grow in selective medium, the marker is said to be 
a positive selectable marker (e.g., antibiotic resistance genes which confer the 
15 ability to grow in the presence of the appropriate antibiotic). Selectable markers 
can also be used to select against host cells containing a particular gene; 
selectable markers used in this manner are referred to as negative selectable 
markers. 

As used herein, the term "construct" is used in reference to nucleic acid 
20 molecules that transfer DNA segment(s) from one cell to another. The term 

"vector" is sometimes used interchangeably with "construct". The term "construct" 
includes circular nucleic acid constructs such as plasmid constructs, phagemid 
constructs, cosmid vectors, etc., as well as linear nucleic acid constructs 
including, but not limited to, PCR products. The nucleic acid construct may 
25 comprise expression signals such as a promoter and/or an enhancer in operable 
linkage, and then is generally referred to as an "expression vector" or "expression 
construct". 

The term "expression construct" as used herein refers to an expression 
module or expression cassette made up of a recombinant DNA molecule 
30 containing a desired coding sequence and appropriate nucleic acid sequences 
necessary for the expression of the operably linked coding sequence in a 
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particular host organism. Nucleic acid sequences necessary for expression in 
prokaryotes usually include a promoter and a ribosome binding site, often along 
with other sequences. Eukaryotic cells are known to utilize promoters, 
enhancers, and termination and polyadenylation signals. 
5 The terms "in operable combination", "in operable order" and "operably 

linked" as used herein refer to the linkage of nucleic acid sequences in such a 
manner that a nucleic acid molecule capable of directing the transcription of a 
given gene and/or the synthesis of a desired protein molecule is produced. The 
terms also refer to the linkage of amino acid sequences in such a manner so that 
10 the reading frame is maintained and a functional protein is produced. 

A cell has been "transformed" or "transfected" with exogenous or 
heterologous DNA when such DNA has been introduced inside the cell. The 
transforming DNA may or may not be integrated (covalently linked) into the 
genome of the cell. In prokaryotes, yeast, and mammalian cells for example, the 
15 transforming DNA may be maintained on an episomal element such as a vector 
or plasmid. With respect to eukaryotic cells, a stably transformed cell is one in 
which the transforming DNA is inherited by daughter cells through chromosome 
m replication. This stability is demonstrated by the ability of the eukaryotic cell to 

i«j establish cell lines or clones comprised of a population of daughter cells 

ili 20 containing the transforming DNA. A "clone" is a population of cells derived from a 
single cell or ancestor by mitosis. A "cell line" is a clone of a primary cell that is 
capable of stable growth in vitro for many generations. An organism, such as a 
plant or animal, that has been transformed with exogenous DNA is termed 
"transgenic". 

25 Transformation of prokaryotic cells may be accomplished by a variety of 

means known in the art, including the treatment of host cells with CaCI 2 to make 
competent cells, electroporation, etc. Transfection of eukaryotic cells may be 
accomplished by a variety of means known in the art, including calcium 
phosphate-DNA co-precipitation, DEAE-dextran-mediated transfection, 

30 polybrene-mediated transfection, electroporation, microinjection, liposome fusion, 
lipofection, protoplast fusion, retroviral infection, and biolistics. 
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As used herein, the term "host" is meant to include not only prokaryotes, 
but also eukaryotes, such as yeast, plant and animal cells. A recombinant DNA 
molecule or gene can be used to transform a host using any of the techniques 
commonly known to those of ordinary skill in the art. Prokaryotic hosts may 
5 include E. coh\ S. tymphimurium, Serratia marcescens and Bacillus subtilis. 
Eukaryotic hosts include yeasts such as Saccharomyces cerevisiae, 
Schizosaccharomyces pombe, Pichia pastoris, mammalian cells and insect cells, 
and, plant cells, such as Arabidopsis thaliana and Tobaccum nicotiana. 

As used herein, the terms "restriction endonucleases" and "restriction 
10 enzymes" refer to bacterial enzymes, each of which cut double-stranded DNA at 
or near a specific nucleotide sequence. 
9 "Recombinant DNA technology" refers to techniques for uniting two 

ijn heterologous DNA molecules, usually as a result of in vitro ligation of DNAs from 

!fj different organisms. Recombinant DNA molecules are commonly produced by 

f a8 15 experiments in genetic engineering. Synonymous terms include "gene splicing", 
x "molecular cloning" and "genetic engineering". The product of these 

H manipulations results in a "recombinant" or "recombinant molecule". The term 

jess 

M "recombinant protein" or "recombinant polypeptide" as used herein refers to a 

■A protein molecule that is expressed from a recombinant DNA molecule. 

20 The ribose sugar is a polar molecule, and therefore, DNA is referred to as 

having a 5' to 3\ or 5' to 3\ directionality. DNA is said to have "5' ends" and "3' 
ends" because mononucleotides are reacted to make oligonucleotides in a 
manner such that the 5' phosphate of one mononucleotide pentose ring is 
attached to the 3' oxygen of its neighbor via a phosphodiester linkage. Therefore, 

25 an end of an oligonucleotide is referred to as the "5' end" if its 5' phosphate is not 
linked to the 3' oxygen of a mononucleotide pentose ring and as the "3' end" if its 
3' oxygen is not linked to a 5' phosphate of a subsequent mononucleotide 
pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger 
oligonucleotide, also has a 5' to 3' orientation. In either a linear or circular DNA 

30 molecule, discrete elements are referred to as being "upstream" or "5'" of the 

"downstream" or "3'" elements. This terminology reflects the fact that DNA has an 
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inherent 5' to 3' polarity, and transcription typically proceeds in a 5' to 3' fashion 
along the DNA strand. The promoter and enhancer elements which direct 
transcription of an operably linked coding region, or open reading frame, are 
generally located 5\ or upstream, of the coding region. However, enhancer 
5 elements can exert their effect even when located 3' of the promoter and coding 
region. Transcription termination and polyadenylation signals are typically 
located 3' or downstream of the coding region. 

The 3' end of a promoter is said to be located upstream of the 5' end of a 
sequence-specific recombinase target site when, moving in a 5' to 3' direction 
10 along the nucleic acid molecule, the 3' terminus of a promoter precedes the 5' 
end of the sequence-specific recombinase target site. When the acceptor 
construct is intended to permit the expression of a translation fusion, the 3' end of 
the promoter is located upstream of both the sequences encoding the amino- 
terminus of a fusion protein and the 5' end of the sequence-specific recombinase 
ll 15 target site. Thus, the sequence-specific recombinase target site is located within 
the coding region of the fusion protein (i.e., located downstream of both the 
promoter and the sequences encoding the affinity domain, such as Gst). 
u As used herein, the term "adjacent", in the context of positioning of genetic 

2l elements in the constructs, shall mean within about 0 to 2500, sometimes 0 to 

: |j 20 1 000 bp and sometimes within about 0 to 500, 0 to 400, 0 to 300 or 0 to 200 bp. 

A DNA "coding sequence" is a double-stranded DNA sequence that is 
transcribed and translated into a polypeptide in vivo when placed under the 
control of appropriate regulatory sequences. The boundaries of the coding 
sequence are determined by a start codon at the 5' (amino) terminus and a 
25 translation stop codon at the 3' (carboxyl) terminus. A coding sequence can 
include, but is not limited to, prokaryotic sequences, cDNA from eukaryotic 
mRNA, genomic DNA sequences from eukaryotic (e.g., mammalian) DNA, and 
even synthetic DNA sequences. A polyadenylation signal and transcription 
termination sequence will usually be located 3* to the coding sequence. A "cDNA" 
30 is defined as copy-DNA or complementary-DNA, and is a product of a reverse 
transcription reaction from an mRNA transcript. An "exon" is an expressed 
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sequence transcribed from the gene locus, whereas an "intron" is a non- 
expressed sequence that is from the gene locus. 

Transcriptional and translational control sequences are DNA regulatory 
sequences, such as promoters, enhancers, polyadenylation signals, terminators, 
5 and the like, that provide for the expression of a coding sequence in a host cell. 
A "cis-element" is a nucleotide sequence, also termed a "consensus sequence" or 
"motif," that interacts with proteins that can upregulate or downregulate 
expression of a specific gene locus. A "signal sequence" can also be included 
with the coding sequence. This sequence encodes a signal peptide, N-terminal 

10 to the polypeptide, that communicates to the host cell and directs the polypeptide 
to the appropriate cellular location. Signal sequences can be found associated 
with a variety of proteins native to prokaryotes and eukaryotes. 

A "promoter sequence" is a DNA regulatory region capable of binding RNA 
polymerase in a cell and initiating transcription of a downstream (3' direction) 

15 coding sequence. For purposes of defining the present invention, the promoter 
sequence includes, at its 3' terminus, the transcription initiation site and extends 
upstream (in the 5' direction) to include the minimum number of bases or 
elements necessary to initiate transcription at levels detectable above 
background. Within the promoter sequence will be found a transcription initiation 

20 site, as well as protein binding domains (consensus sequences) responsible for 
the binding of RNA polymerase. Eukaryotic promoters often, but not always, 
contain "TATA" boxes and "CAT" boxes. 

Efficient expression of recombinant DNA sequences in eukaryotic cells 
requires expression of signals directing the efficient termination and 

25 polyadenylation of the resulting transcript. Transcription termination signals are 
generally found downstream of the polyadenylation signal and are a few hundred 
nucleotides in length. 

As used herein, "an origin of replication" or "origin" refers to any sequence 
capable of directing replication of a DNA construct in a suitable prokaryotic or 

30 eukaryotic host (e.g., the ColE1 origin and its derivatives; the yeast 2 |i origin). 
Eukaryotic expression vectors may also contain "viral replicons" or "origins of 
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replication". Viral replicons are viral DNA sequences which allow for the 
extrachromosomal replication of a vector in a host cell expressing the appropriate 
replication factors. Vectors which contain either the SV40 or polyoma virus origin 
of replication replicate to high copy number (up to 10 4 copies/cell) in cells that 
5 express the appropriate viral T antigen. Vectors which contain the replicons from 
bovine papillomavirus or Epstein-Barr virus replicate extrachromosomally at low 
copy number (-1 00 copies/cell). 

As used herein, the terms "nucleic acid molecule encoding", "DNA 
sequence encoding", and "DNA encoding" refer to the order or sequence of 
10 deoxyribonucleotides along a strand of deoxyribonucleic acid. The order of these 
deoxyribonucleotides determines the order of amino acids along the polypeptide 

□ (protein) chain. The DNA sequence thus codes for the amino acid sequence. 
;*{ As used herein, the term "gene" means the deoxyribonucleotide 

ill sequences comprising the coding region of a structural gene, i.e., the coding 

15 sequence for a protein or polypeptide of interest, including sequences located 
: ^ adjacent to the coding region on both the 5' and 3' ends for a distance of about 1 

□ kb on either end, such that the gene corresponds to the length of the full-length 
u mRNA. The sequences which are located 5' of the coding region and which are 
^ present on the mRNA are referred to as 5' non-translated sequences. The 

i ; y 20 sequences which are located 3' or downstream of the coding region and which 
are present on the mRNA are referred to as 3' non-translated sequences. The 
term "gene" encompasses both cDNA and genomic forms of a gene. A genomic 
form or clone of a gene contains the coding region interrupted with non-coding 
sequences termed "introns" or "intervening regions" or "intervening sequences". 

25 Introns are segments of a gene that are transcribed into heteronuclear RNA 
(hnRNA); introns may contain regulatory elements such as enhancers. Introns 
are removed or "spliced out" from the nuclear or primary transcript; introns 
therefore are absent in the mature messenger RNA (mRNA) transcript. The 
mRNA functions during translation to specify the sequence or order of amino 

30 acids in a nascent polypeptide. 
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In addition to containing introns, genomic forms of a gene may also include 
sequences located on both the 5' and 3' end of the sequences that are present on 
the RNA transcript. These sequences are referred to as "flanking" sequences or 
regions (these flanking sequences are located 5' or 3' to the non-translated 
sequences present on the mRNA transcript). The 5' flanking region may contain 
regulatory sequences such as promoters and enhancers which control or 
influence the transcription of the gene. The 3' flanking region may contain 
sequences which direct the termination of transcription, post-transcriptional 
cleavage and polyadenylation. 

As used herein, the term "purified" or "to purify" refers to the removal of 
contaminants from a sample. For example, recombinant Cre polypeptides are 
expressed in bacterial host cells (e.g., as a GST-Cre or (HN) 6 -Cre fusion protein) 
and the Cre polypeptides are purified by the removal of host cell proteins; the 
percent of recombinant Cre polypeptides is thereby enriched or increased in the 
sample. 

As used herein the term "portion" refers to a fraction of a sequence, gene 
or protein. "Portion" may comprise a fraction greater than half of the sequence, 
gene or protein, equal to half of the sequence, gene or protein or less than half of 
the sequence, gene or protein. Typically as used herein, two or more "portions" 
combine to comprise a whole sequence, gene or protein. 

As used herein, the term "fusion protein" refers to a chimeric protein 
containing a protein of interest joined to an exogenous protein fragment. The 
fusion partner may enhance solubility of the protein of interest as expressed in a 
host cell, may provide an affinity tag to allow purification of the recombinant fusion 
protein from the host cell or culture supernatant, or both. If desired, the fusion 
protein may be removed from the protein of interest by a variety of enzymatic or 
chemical means known to the art. 

DESCRIPTION OF THE SPECIFIC EMBODIMENTS 
Methods are provided for producing a vector that includes at least one 
splicable intron. In the subject methods, intron containing vectors are produced 
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from donor and acceptor vectors that each include a site specific recombinase 
site, where the subject donor and acceptor vectors further include splice donor 
and acceptor sites that, upon site specific recombination of the donor and 
acceptor vectors, define an intron in the product vector of the recombination step. 
5 Also provided are compositions for use in practicing the subject methods, 

including the donor and acceptor vectors themselves, as well as systems and kits 
that include the same. The subject invention finds use in a variety of different 
applications, including the production of expression vectors that encode C- 
terminal tagged fusion proteins, the production of expression vectors that encode 
10 pure protein and not a fusion thereof, and the like. 

Before the subject invention is described further, it is to be understood that 
the invention is not limited to the particular embodiments of the invention 
described below, as variations of the particular embodiments may be made and 
15 still fall within the scope of the appended claims. It is also to be understood that 
the terminology employed is for the purpose of describing particular 
embodiments, and is not intended to be limiting. Instead, the scope of the present 
invention will be established by the appended claims. 

20 In this specification and the appended claims, the singular forms "a," "an" 

and "the" include plural reference unless the context clearly dictates otherwise. 
Unless defined otherwise, all technical and scientific terms used herein have the 
same meaning as commonly understood to one of ordinary skill in the art to which 
this invention belongs. 

25 

Where a range of values is provided, it is understood that each intervening 
value, to the tenth of the unit of the lower limit unless the context clearly dictates 
otherwise, between the upper and lower limit of that range, and any other stated 
or intervening value in that stated range, is encompassed within the invention. 
30 The upper and lower limits of these smaller ranges may independently be 

included in the smaller ranges, and are also encompassed within the invention, 
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subject to any specifically excluded limit in the stated range. Where the stated 
range includes one or both of the limits, ranges excluding either or both of those 
included limits are also included in the invention. 



5 Unless defined otherwise, all technical and scientific terms used herein 

have the same meaning as commonly understood to one of ordinary skill in the 
art to which this invention belongs. Although any methods, devices and materials 
similar or equivalent to those described herein can be used in the practice or 
testing of the invention, the preferred methods, devices and materials are now 
10 described. 

jl| All publications mentioned herein are incorporated herein by reference for 

Q the purpose of describing various invention components that are described in the 

i II 

jjjj publications which might be used in connection with the presently described 

Y* 15 invention. 

s aJ 

■¥ 

:«* In further describing the subject invention, the subject methods are 

Y 9 reviewed first in greater detail, followed by a review of representative applications 

jssa 

\\ in which the subject methods find use, as well as a review of systems, libraries 

!;{ 20 and kits for use in practicing the subject methods. 

Methods 

As summarized above, the subject invention provides recombinase-based 
25 methods for producing intron containing vectors. In other words, the subject 

invention provides methods of producing vectors that include at least one intron, 
where the methods are site specific recombinase based methods. By "site 
specific recombinase" based method is meant that the subject methods employ a 
recombinase mechanism to produce the subject intron containing vectors. The 
30 recombinase mechasism that is employed in the subject methods is one in which 
a recombinase mediates the transfer of a nucleic acid from a donor to an 
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acceptor vector, where the donor and acceptor vectors each include at least one 
recombinase recognition site. A variety of different site specific recombinase 
systems suitable for transferring a nucleic acid from a donor to an acceptor vector 
are known and may be modified to be useful in the subject invention. Such 
5 systems include those described in U.S. Patent Nos. 5,851,808; 5,888,732; and 
U.S. Provisional Application Serial No. 09/616,651, the disclosure of which are 
herein incorporated by reference, as well as WO 00/12687 and WO 01/05961, the 
disclosures of the priority documents of which are herein incorporated by 
reference. 

10 In general, in addition to each including at least one recombinase 

recognition site, the donor and acceptor vectors each include at least one splice 
l Z site, e.g., a splice donor site or a splice acceptor site. In certain embodiments, the 

y 

C) donor and acceptor vectors each include a single splice site, where in many of 
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these embodiments, the donor vector includes a splice donor site and the 

15 acceptor vector includes a splice acceptor site. In yet other embodiments, the 
donor and acceptor vectors each include splice donor and acceptor sites which 
are oriented such that they do not form an intron in the donor vectors but, upon 
recombinase mediated recombination of the donor and acceptor vectors, produce 
a resultant vector with two distinct introns. In such designs, the acceptors will 

20 contain one synthetic intron that encompasses the recombinase recognition 
sequence and the acceptor partial selectable marker. 

Any convenient splice sites (i.e., splice donor and acceptor sites) may be 
employed in the vectors of the subject method. Representative splice sites or 
sequences, e.g., domains, of interest that may be employed include both splice 

25 sites that require specifically provided factors for splicing, e.g., eukaryotic host 
factors (as found in a eukaryotic host cells) such that the intron is only spliced in a 
eukaryotic host cell or an mimetic (e.g., in vivo or in vitro) environment that 
provides all the relevant factors, and splice sites that are self-splicing or 
autocatalytic, i.e., do not require specific factors for splicing to occur, and thus are 

30 spliced in both eukaryotic and prokaryotic environments, as well as in vitro 
environments. Examples include the splicing elements of Group I and Group II 
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self-splicing introns found in bacteria, and certain cellular organelles, e.g., the 
highly conserved in Group I self-splicing intron, P7; the bacterial group II intron L 
lactis Ll.ltrB; the yeast mitochondrial group II introns ah and al2; and the 
bacterial group II intron Sinorhizobium meliloti Rmlntl (see Oe Y., et al.,2001; and 
5 Martlnez-Abarca, F. and Toro, N., 2000) 

Any convenient splice acceptor donor and acceptor sites may be 
employed. Consensus sequences for the 5' splice donor site and the 3' splice 
acceptor site used in RNA splicing are well known in the art (See, Moore, et al., 
10 1993, The RNA World, Cold Spring Harbor Laboratory Press, p. 303-358). In 
addition, modified consensus sequences that maintain the ability to function as 5' 

Q donor splice sites and 3' splice acceptors sites may be used in the practice of the 

: s ;;i invention. In certain embodiments, splice-donor sites have a characteristic 

consensus sequence represented as: (A/C)AGGURAGU (where R denotes a 

i|j 15 purine nucleotide) with the GU in the fourth and fifth positions being required 

(Jackson, I. J., Nucleic Acids Research 19: 3715-3798 (1991)). Splice-donor sites 

□ are functionally defined by their ability to effect the appropriate reaction within the 

mRNA splicing pathway. An unpaired splice-donor site is defined herein as a 
splice-donor site which is present in a donor or acceptor vector, typically a donor 

rU 20 vector, and is not accompanied in the vector by a splice-acceptor site positioned 
3' to the unpaired splice-donor site. Upon recombinase mediated recombination 
between the donor and acceptor vectors, the unpaired splice-donor site results in 
splicing to a splice-acceptor site originally present in the other vector. A splice- 
acceptor site is a sequence which, like a splice-donor site, directs the splicing of 
25 an intron out of a resultant expression cassette produced upon recombinase 
mediated recombination of the donor and acceptor vectors. Acting in conjunction 
with a splice-donor site, the splicing apparatus uses a splice-acceptor site to 
effect the removal of an intron. Splice-acceptor sites have a characteristic 
sequence represented as: YYYYYYYYYYNYAG, where Y denotes any pyrimidine 
30 and N denotes any nucleotide (Jackson, I. J., Nucleic Acids Research 19:3715- 
3798 (1991)). For convenience, in the present embodiments, the splice acceptor 
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sequence is immediately preceded by the intron Branch site and these are 
considered here as one unit, although the may be separated. The consensus 
Branch site is: YNYYRAY, where Y denotes any pyrimidine, R any purine, and N 
denotes any nucleotide. 
5 Specific splice sites of interest include, but are not limited to: (a) the novel 

consensus intron sequences and the Human hemoglobin Beta donor and 
acceptor sequences described in Liu Z. et al Anal Biochem 246: 264-267 (1997) 
and found in the experimental section, infra; (b) the donor and acceptor 
sequences found in the SV40 late 19s and 16s mRNA introns (see pCMV myc 
10 from Clontech ); (c) the splice donor and acceptor sequences found in the rabbit 
Beta globin intron (found in the vector pCMV-neo-Bam); and the like. 

The position of the splice donor and acceptor sequences in the various 
donor and acceptor vectors determines the location of the intron in the resultant 
product vector and, therefore, the domain that is spliced out of the resultant 
n 15 vector under appropriate splicing conditions, e.g., in a eukaryotic host cell. Thus, 

as 

by knowing how the acceptor and donor vectors recombine into a resultant 
~J vector, one can position the donor and acceptor splice sites in the donor and 

acceptor vectors to provide for an intron in any location of the resultant vector, 
«| and therefore removal of any sequence of the resultant vector. For example, the 

ij 20 donor and acceptor splice sites can be positioned to provide for a spliceable 

intron in the resultant product vector that includes the 3' recombinase recognized 
site, the 5' recombinase recognized site, etc. See, e.g., the experimental section 
below for more details with respect to a donor and acceptor vector system in 
which the donor and acceptor splice sites are positioned to provide for a resultant 
25 vector in which the 3' recombinase site (lox) is present in a spliceable intron. 

In many embodiments of interest, the donor and acceptor vectors are 
further characterized in that one of the donor and acceptor vectors includes only 
one recombinase recognition site, while the other of the donor and acceptor 
vectors includes two recombinase recognition sites. As mentioned above, in many 
30 embodiments, the donor vector includes two recombinase recognition sites while 
the acceptor vector includes a single recombinase recognition site. In an 
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alternative embodiment, the donor vector includes a single recombinase 
recognition site while the acceptor vector includes two recombinase recognition 
sites. Such a system is described in U.S. Application Serial No. 09/616,651, the 
disclosure of which is herein incorporated by reference. 
5 A feature of the vectors of these embodiments is that the donor and 

acceptor vectors must be able to recombine in the presence of a suitable 
recombinase to produce an expression vector as described above, where the 
expression vector lacks at least a portion of the initial donor or acceptor vector, 
i.e., it is a non-fusion expression vector. As such, the donor and acceptor vectors 
10 must be able to participate in a recombination event that is other than a fusion 
event, where by fusion event is meant an event in which two complete vectors are 
fused in their entirety into one fused vector, e.g., where two plasmids are fused 
together to produce one plasmid that includes all of material from the initial two 
plasmids, i.e., a fusion plasmid. As such, the subject methods of these particular 
15 embodiments are not fusion methods, where such methods are defined as those 
methods in which a single vector is produced from two or more initial vectors in 
p their entirety, such that all of the initial vector material of each parent vector, e.g., 

u plasmid, is present in its entirety in the resultant fusion vector. 

^ The donor and acceptor vectors of these particular embodiments are 

ill 20 further characterized in that one of the donor and acceptor vectors includes only 
one recombinase recognition site, while the other of the donor and acceptor 
vectors includes two recombinase recognition sites. In a first preferred 
embodiment, the donor vector includes two recombinase recognition sites while 
the acceptor vector includes a single recombinase recognition site. In an 
25 alternative embodiment, the donor vector includes a single recombinase 

recognition site while the acceptor vector includes two recombinase recognition 
sites. The donor and acceptor vectors of this first, preferred embodiment and this 
second, alternative embodiment, are described in greater detail below. 

The donor and acceptor vectors described generally above may be linear 
30 or circular, e.g., plasmids, and in many embodiments of the subject invention are 
plasmids. Where the donor and acceptor vectors are plasmids, the donor and 
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acceptor vectors typically range in length from about 2 kb to 200 kb, usually from 
about 2 kb to 40 kb and more usually from about 2 kb to 10 kb. 

The donor and acceptor vectors are further characterized in certain 
embodiments in that all of the recombinase recognition sites on the donor and 
5 acceptor vectors must be recognized by the same recombinase and should be 
able to recombine with each other, but within this parameter they may be the 
same or different, but in many embodiments are usually the same. Recombinase 
recognition sites, i.e., sequence-specific recombinase target sites, of interest 
include: Cre recombinase activity recognized sites, e.g., loxP, loxP2, loxP511, 
10 loxP514, loxB, loxC2, loxL, loxR, loxA86, IoxA117; att, dif; frt; and the like. The 
U : particular recombinase recognition site is chosen, at least in part, based on the 

y nature of the recombinase to be employed in the subject methods. 



: j?5 



15 



The Donor Vector 



As mentioned above, in a preferred embodiment of the subject methods, 
h| the donor vector includes two recombinase recognition sites while the acceptor 



vector includes a single recombinase recognition site. In the donor vector of 
these embodiments, the donor vector includes two recombinase recognition sites, 

20 capable of recombining with each other, e.g., site 1A and site 1B, that flank or 
border a first or donor domain, i.e., desired donor fragment, where this domain is 
the portion of the vector that becomes part of the expression vector produced by 
the subject methods. The length of the donor domain may vary, but in many 
embodiments ranges from 1 kb to 200 kb, usually from about 1 kb to 10 kb. The 

25 portion of the donor vector that is not part of this donor domain, i.e., the part that 
is 5' of site 1A and 3' of site 1B, is referred to herein for clarity as the non-donor 
domain of the donor vector. 

The two recombinase recognition sites of the donor vector are 
characterized in that they are oriented in the same direction and are capable of 

30 recombining with each other. By oriented in the same direction it is meant that 



B, F & F Ref: CLON-069 
Clontech Ref: P-90 

F:\DOCUMENT\CLON\069\patent application.doc 

20 



!?1 



a 



they have the same head to tail orientation. Thus, the orientation of site 1 A is the 
same as the orientation of site 1 B. 

The donor domain flanked by the two recombinase recognition sites, i.e., 
the portion of the vector 3' of the first recombinase site 1A and 5' of the second 
5 recombinase site 1 B, includes at least the following components: (a) at least one 
restriction site and (b) at least a portion of a selectable marker, e.g. a coding 
sequence, a promoter, or a complete selectable marker made up of a coding 
sequence and a promoter. The donor domain may include at least one restriction 
site or a plurality of distinct restriction sites, e.g., as found in a multiple cloning site 
10 or polylinker, where by restriction site is meant a stretch of nucleotides that has a 
sequence that is recognized and cleaved by a restriction endonuclease. Where a 
plurality of restriction sites are present in the donor domain, the number of distinct 
or different restriction sites typically ranges from about 2 to 5, usually from about 
2 to 13. 

15 In many embodiments, there are at least two restriction sites, which may or 

may not be identical depending on the particular protocol employed to produce 
the donor plasmid, that flank a nucleic acid which is a coding sequence for a 
protein of interest, where the protein of interest may or may not be known, e.g., it 
may be a known coding sequence for a known protein or polypeptide or a coding 

20 sequence for an as yet unidentified protein or polypeptide, such as where this 
nucleic acid of interest is a constituent of a library, as discussed in greater detail 
below. The length of this nucleic acid of interest nucleic acid may vary greatly, but 
generally ranges from about 18 bp to 20 kb, usually from about 100 bp to 10 kb 
and more usually from about 1 kb to 3 kb. At least one restriction site and this 

25 nucleic acid of interest nucleic acid, when present, are sufficiently close to the 3' 
end of the first flanking recombinase site, i.e., recombinase recognition site 1A, 
such that in the expression vector produced from the donor plasmid, expression 
of the coding sequence of the nucleic acid of interest is driven by a promoter 
positioned 5' of this first recombinase site. As such, the distance separating this 

30 restriction site/nucleic acid of interest nucleic acid from the recombinase site 
typically ranges from about 1 bp to 150 bp, usually from about 1 bp to 50 bp. 
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In a first preferred embodiment, the donor domain also generally includes a 
portion of a selectable marker. By portion of a selectable marker is meant a sub- 
part of a selectable marker, e.g. a coding sequence or a promoter, which can be 
joined with a second subpart to produce a functioning selectable marker that 
5 confers some selectable phenotype on the host cell in which the expression 
vector produced by the subject methods is to be propogated. Examples of 
subparts of selectable markers are coding sequences and promoters. As such, in 
many embodiments, the portion of the selectable marker present on the donor 
domain is a coding sequence of a marker gene or a promoter capable of driving 
10 expression of the coding sequence of the marker gene, where in certain preferred 
embodiments, the coding sequence of a marker gene is the portion of the 
Q selectable marker present on the donor domain. Examples of coding sequences 

;"f of interest include, but are not limited to, the coding sequences from the following 

U f l marker genes: the chloramphenicol resistance gene, the ampicillin resistance 

:Q 15 gene, the tetracycline resistance gene, the kanamycin resistance gene, the 

its 

streptomycin resistance gene and the SacB gene from B. subtilis encoding 
□ sucrase and conferring sucrose sensitivity; and the like. The promoter portions or 

|SS3 

\^ sub-parts of this selectable marker are any convenient promoters capable of 

!:f driving expression of the selectable marker in the expression vector produced by 

ft! 20 the subject methods, see infra, and in many embodiments are bacterial 

promoters, where particular promoters of interest include, but are not limited to: 
• the Ampicillin resistance promoter, the inducible lac promoter, the tet-inducible 
promoter from pProTet (Pi te to-i)- available from CLONTECH, T7, T3, and SP6 
promoters; and the like. The distance of this sub-part or portion of the selectable 
25 marker from the 3' end of the second recombinase recognition site, i.e., site 1B, is 
sufficient to provide for expression of the marker to occur in the final expression 
vector, where the other part of selectable marker that is required for efficient 
expression of the selectable marker is present on the other side, i.e., the 5' side 
of the adjacent recombinase recognition site. This distance typically ranges from 
30 about 1 bp to 2.5 kb, usually from about 1 bp to 500 bp. 
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The length of the donor domain flanked by the first and second 
recombinase sites of the donor plasmid, i.e., the length of the desired donor 
fragment, may vary greatly, so long as the above described components are 
present on the donor domain. Generally, the length is at least about 100 bp, 
usually at least about 500 bp and more usually at least about 900 bp, where the 
length may be as great as 100 kb or greater, but generally does not exceed about 
20 kb and usually does not exceed about 10 kb. Typically, the length of the donor 
domain ranges from about 100 bp to 100 kb, usually from about 500 bp to 20 kb 
and more usually from about 900 bp to 10 kb. 

In addition to the above described components, the donor vector may 
include a number of additional elements, where desired, that are present on the 
non-donor domain or non-desired donor fragment of the donor vector. For 
example, the non-donor domain generally includes an origin of replication. This 
origin of replication may be any convenient origin of replication or ori site, where a 
number of ori sites are known in the art, where particular sites of interest include, 
but are not limited to: ColE1 and its derivatives, pMB1, other origins that function 
in prokaryotic cells, the yeast 2 micron origin and the like. Also present on this 
non-donor domain of certain preferred embodiments is a selective marker gene 
that provides for negative selection of the non-donor domain under particular 
conditions, e.g., negative selection conditions. This marker is fully functional and 
therefor is made up of a coding sequence operably linked to an appropriate 
promoter, i.e., is provided by a functional expression module or cassette. Markers 
of interest that are capable of providing for this negative selection include, but are 
not limited to: SacB, providing sensitivity to sucrose; ccdB; and the like. 

This non-donor domain of the donor vector may further include one or 
more additional components or elements that impart additional functionality to the 
donor vector. For example, the donor vector may be a vector that is specifically 
designed for use in conjunction with a yeast two hybrid assay protocol, e.g., such 
that one can determine whether the gene of interest present in the donor domain 
encodes a product that binds to a second protein prior to transferal of the gene of 
interest to an expression vector. In such embodiments, the non-donor domain 
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typically includes the following additional elements: yeast origins of replication, 
e.g., the yeast 2 micron origin; yeast selection markers, e.g., URA3, Leu, and trp 
selection markers; and peptide fragments of yeast transcription factors that are 
expressed as translational fusions to the gene encoded within the donor-domain; 
5 where yeast two hybrid systems are known to those of skill in the art and 

described in: Fields, S. and O-K. Song. 1989. A novel genetic system to detect 
protein-protein interactions. Nature 340:245-246; Fields, S. and R. Sternglanz. 
1994. The two-hybrid system: an assay for protein-protein interactions. Trends 
Genet 10: 286-292 and the MATCHMAKER system III user manual, available 
10 from CLONTECH. 

In other embodiments, the non-donor domain and/or donor domains may 
q contain yet other functional elements that provide specific functions to the donor. 

For example, Donor vectors can be designed that would also function as 
HI prokaryotic expression vectors that express the gene of interest encoded on the 

,p 15 donor domain in prokaryotic cells either as a native protein or fused to an affinity 

or epitope tag. Such vectors may include the following elements in their non- 
□ donor or donor domains (e.g., 3' of the multiple cloning site): inducible bacterial 

promoters, such as the lac promoter or the Piteto-i promoter; affinity or epitope 
^ tags, e.g., GST, 6x(HN), myc-tag, HA-Tag, GFP and its derivatives. Donor 

!| 20 vectors designed to function as retroviral vectors would additionally include 

retroviral LTRs and packaging signals in the non-donor domain. Donor vectors for 
expression in mammalian cells might also encode affinity or epitope tags, e.g., 
GST, 6x(HN), myc-tag, HA-Tag, GFP and its derivatives; and mammalian 
constitive or inducible promoters, e.g., the CMV promoter, the tet-inducible 
25 promoter, the TK promoter; viral promoters, e.g., T7, T3, SP6. In a preferred 
embodiment of this particular embodiment of the subject invention, the donor 
vector is as follows. The donor-partial selectable marker comprises the open 
reading frame (ORF) for a selectable marker gene, and is placed between the two 
donor sequence-specific recombinase target sites, adjacent to the second-donor 
30 sequence-specific recombinase target site. In a more preferred embodiment of 
the donor construct, the open reading frame of the selectable marker is situated 
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such that its 5' to 3' orientation is opposite that of the two donor sequence- 
specific recombinase target sites. 

In another embodiment of the donor construct, the donor construct is a 
closed circle (e.g., a plasmid or cosmid) comprising, in addition to the two donor 
5 sequence-specific recombinase target sites, the unique restriction site or 

polylinker and the selectable marker gene open reading frame, at least one origin 
of replication, and at least one donor-functional selectable marker gene. The 
methods of the present invention should not be limited by the origin of replication 
selected. For example, origins such as those found in the pUC series of plasmid 
10 vectors or of the pBR322 plasmid may be used, as well as others known in the 
art. Those skilled in the art know that the choice of origin depends on the 
application for which the donor construct is intended and/or the host strain in 
which the construct is to be propagated. 

A variety of selectable marker genes may be utilized, either for the donor- 
=1] 15 partial selectable marker or for the donor-functional selectable marker, and such 
genes may confer either positive- or negative-resistance phenotypes; however, 
u the donor-partial and the donor-functional selectable marker genes should be 

|c::i: 

u different from one another. In a preferred embodiment, the selectable markers 

t «j are selected from the group consisting of the chloramphenicol resistance gene, 

ill 20 the ampicillin resistance gene, the tetracycline resistance gene, the kanamycin 
resistance gene, the streptomycin resistance gene and the sacB gene from B. 
subtilis encoding sucrase and conferring sucrose sensitivity. In a more preferred 
embodiment, the donor-partial selectable marker is a portion of the gene (e.g., the 
open reading frame) for chloramphenicol resistance and the donor-functional 
25 selectable marker gene is the gene for ampicillin resistance. In another preferred 
embodiment of the donor construct, the origin of replication and the donor- 
functional selectable marker gene lie 5' of the first-donor sequence-specific 
recombinase target site. 

In another embodiment of the present invention, there is provided a donor 
30 construct with all the above-described features, but additionally having a marker 
gene different from either the donor-functional selectable marker gene or the 
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donor-partial selectable marker gene, wherein the additional marker gene is 
positioned 5' of the first sequence-specific recombinase target site such that upon 
combination with a recombinase, the additional marker gene is located on the 
undesired second donor fragment. This marker gene provides an additional 
5 screen to exclude any products that result in recombinants containing the second 
donor fragment. The marker gene could be, for example, LacZ. In this case, 
incorrect recombinants would generate blue colonies on X-Gal plates. 
Alternatively, a more preferred additional marker would be the sacB gene 
conferring sucrose sensitivity. In this case, any incorrect clones would be killed 
10 when grown on sucrose containing medium. The additional marker provides 
another screen, thereby enhancing the system by further ensuring that only 
correct recombination products are obtained following recombination and 
transformation. 

In yet another embodiment of the donor construct, the donor construct 
15 further comprises a termination sequence placed 3' of the restriction site or 

polylinker sequence but 5' of the second-donor sequence-specific recombinase 
target site. In a most preferred embodiment, the termination sequence is placed 
5' of the 3' end of the donor-partial selectable marker (e.g. the ORF of the 
S\ selectable marker gene in the preferred embodiment which is in the 5' to 3' 

iii 20 orientation opposite that of both donor sequence specific recombinase target 
sites). The present embodiment is not be limited by the termination sequence 
chosen. In one embodiment, the termination sequence is the T1 termination 
sequence; however, a variety of termination sequences are known to the art and 
may be employed in the nucleic acid constructs of the present invention, including 
25 the T6S, TINT, TL1 , TL2, TR1 , and TR2 termination signals derived from the 

bacteriophage lambda, and termination signals derived from bacterial genes such 
as the trp gene of E. coli. 

In another preferred embodiment of the donor construct, the donor 
construct further comprises a polyadenylation sequence placed 3' of the unique 
30 restriction site(s) or polylinker but 5 1 of the second-donor sequence-specific 
recombinase target site. In a most preferred embodiment, the polyadenylation 
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sequence is placed 5' of the 3' end of the open reading frame of the selectable 
marker gene similar to the placement described for the termination sequence 
supra. The present invention should not be limited by the nature of the 
polyadenylation sequence chosen. In one embodiment, the polyadenylation 
5 sequence is selected from the group consisting of the bovine growth hormone 
polyadenylation sequence, the simian virus 40 polyadenylation sequence and the 
Herpes simplex virus thymidine kinase polyadenylation sequence. 

Also, in a preferred embodiment, the donor construct further comprises a 
gene or DNA sequence of interest inserted into the unique restriction enzyme site 

10 or polylinker. The present invention should not be limited by the size of the DNA 
of interest inserted into the unique restriction site or polylinker nor the source of 
DNA (e.g., genomic libraries, cDNA libraries, etc.). 

Thus, in a most preferred embodiment of the donor nucleic acid construct, 
there is provided, in 5' to 3' order: a) a first-donor sequence-specific recombinase 

15 target site; b) a nucleic acid or gene of interest; c) termination and 

polyadenylation sequences; d) an open reading frame for a selectable marker 
gene in a 5' to 3' orientation opposite to that of the first-donor sequence-specific 
recombinase target site; e) a second-donor sequence-specific recombinase target 
site in the same 5' to 3' orientation as the first donor sequence-specific 

20 recombinase target site, wherein the second-donor sequence-specific 

recombinase target site is able to recombine with said first-donor sequence- 
specific recombinase target site; f) an origin of replication; and g) a donor- 
functional selectable marker gene. 

In addition to the above features, the donor vector also includes at least 

25 one splice site, e.g., a splice donor and/or splice acceptor site. Two representa 
and non-limiting embodiments are now reviewed. In certain embodiments, the 
donor vector includes a splice donor site that is positioned to provide for an intron 
flanking the 3' sequence specific recombinase site in the product vector. In these 
embodiments, the splice donor site is positioned between the 5' and 3' sequence 

30 specific recombinase sites and, more usually, 3' of the multiple cloning site or 
gene of interest and 5* of the second sequence specific recombinase site. These 
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embodiments find use in producing vectors that express the gene of interest as a 
C-terminal tagged fusion, as a product that does not include sequence encoded 
by the 3* sequence specific recombinase site, etc. In certain embodiments, the 
donor vector also includes a splice acceptor site that is immediately 3' of the 5' 

5 sequence specific recombinase site. Since the splice acceptor is 5' of the splice 
donor sites in the vector, the two splice sites to not make a spliceable intron in the 
donor vector. However, upon recombination with an appropriate acceptor vector, 
a product vector in which both the 5' and 3* sequence specific recombinase sites 
are present in distinct introns can be produced. These embodiments are useful in 

10 applications where one wishes to express a protein from the product vector in a 
manner that is free of any residues encoded by the 5' and 3' sequence specific 

C recombinase sites. 

O 

Q 

jji The Acceptor Vector 

^ 15 

••Li 

41 As mentioned above, in a preferred embodiment of the subject invention, 

j«j the acceptor vector employed in the subject methods is a vector that includes a 

^ single recombinase site. In these embodiments, the single recombinase site is 

flanked on one side by a promoter and on the other side, in certain preferred 
Si 20 embodiments, by a portion of a selectable marker, e.g., a promoter or a coding 
sequence, where in many preferred embodiments described further below, this 
portion or sub-part of the selectable marker is a second promoter, e.g., a bacterial 
promoter. In these embodiments, the single recombinase site is flanked by two 
oppositely oriented promoters, where one of promoters drives expression of the 
25 gene of interest in the expression vector produced by the subject methods and 
the second promoter drives expression of the coding sequence of the 
recombinant-functional selectable marker in the expression vector produced by 
the subject methods. In these embodiments, the first promoter is a promoter that 
is capable of driving expression of the gene of interest in the expression vector, 
30 where representative promoters include, but are not limited to the CMV promoter, 
the tet-inducible promoter; retroviral LTR promoter/enhancer sequences, the TK 
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promoter, bacterial promoters, e.g. the lac promoter , the Pueto-1 promoter; the 
yeast ADH promoter and the like. The distance between the first promoter and the 
recombinase site is one that allows for expression in the final expression vector, 
where the distance typically ranges from about 1 bp to 1000 bp, usually from 
5 about 10 bp to 500 bp. The second promoter is a promoter that is capable of 
driving expression of the recombinant-functional selectable marker, and is 
generally a bacterial promoter. Bacterial promoters of interest include, but are not 
limited to: the Ampicillin promoter, the lac promoter , the Pueto-i promoter , the T7 
promoter and the like. The distance between the bacterial promoter and the 
10 recombinase site is sufficient to provide for expression of the selectable marker in 
the expression vector and typically ranges from about 1 bp to 2.5 kb, usually from 

M about 1 bp to 200 bp. 

O 

G As indicated above, in yet other preferred embodiments the acceptor 

ill 

m vector lacks the portion or subpart of the selectable marker. In these 

15 embodiments, the acceptor vector may be used with a donor vector that includes 

J: a complete positive selectable marker in the desired donor fragment flanked by 

;L the two recombinase sites, i.e., the donor vector portion located between the 3' 

h L ~ end of the first recombinase site and the 5' end of the second recombinase site. 

Issa 

Alternatively, the acceptor vector may be used with a donor vector that only 



20 includes a partial selectable positive marker, as described above, where the 
partial marker is nonetheless functional in the resultant expression vector. 

The acceptor vector of the embodiments described above may include a 
number of additional components or elements which are requisite or desired 
depending on the nature of the expression vector to be produced from the 

25 acceptor vector. In many embodiments of the subject invention, the acceptor 
vector is an acceptor nucleic acid construct comprising: a) an origin of replication 
capable of replicating the final desired recombination construct or expression 
vector; b) an acceptor sequence-specific recombinase target site having a 
defined 5' to 3' orientation; c) a first promoter adjacent to the 5' end of the 

30 acceptor sequence-specific recombinase target site; and d) an acceptor-partial 
selectable marker, wherein the acceptor-partial selectable marker is capable of 
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recombining with a donor-partial selectable marker from a donor construct (or first 
donor fragment, once the donor construct is resolved) so creating a recombinant- 
functional selectable marker in a final desired recombination construct. As in the 
donor construct, the acceptor construct is not limited by the nature of the 
5 sequence-specific recombinase target site employed, and in preferred 

embodiments the sequence-specific recombinase target site may be selected 
from the group consisting of loxP, loxP2, loxP511, loxP514, loxB, loxC2, loxL, 
loxR, loxA86, loxA117, loxP3, loxP23, att, dif, and frt. The acceptor sequence- 
specific recombinase target site from the acceptor construct does not have to be 
10 identical to those on the donor construct; however, the sequence-specific 

recombinase target sites on the acceptor and donor constructs must be able to 
El recombine with each other. 

: : 

□ In a preferred embodiment, the acceptor-partial selectable marker is a 

! :f-: 
\Ji I 

m second promoter, wherein the second promoter is oriented such that its 5' to 3' 

15 orientation is opposite that of the acceptor sequence-specific recombinase target 
.£ site and the first promoter, and wherein the 3' end of the second promoter is 

U adjacent to the 3' end of the acceptor sequence-specific recombinase target site. 

The acceptor construct is not limited by the nature of the origin of 
replication employed. A variety of origins of replication are known in the art and 
20 may be employed on the acceptor nucleic acid constructs of the present 

invention. Those skilled in the art know that the choice of origin depends on the 
application for which the acceptor construct is intended and/or the host strain in 
which the construct is to be propagated. In the case of the acceptor construct, 
the origin of replication is chosen appropriately such that both the acceptor 
25 construct and the final desired recombination construct will be able to replicate in 
the given host cell. 

The acceptor construct also is not limited by the nature of the promoters 
employed. Those skilled in the art know that the choice of the promoter depends 
upon the type of host cell to be employed for expressing a gene(s) under the 
30 transcriptional control of the chosen promoter. A wide variety of promoters 
functional in viruses, prokaryotic cells and eukaryotic cells are known in the art 
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and may be employed in the acceptor nucleic acid constructs of the present 
invention. In a preferred embodiment of the invention, the donor construct 
contains a gene or DNA sequences of interest and when the donor construct 
recombines with the acceptor construct, the first promoter of the acceptor 
5 construct is positioned such that it will drive expression of the gene or DNA 
sequences of interest. Thus, a promoter capable of driving the gene or DNA 
sequences of interest should be chosen for the first promoter. Further, in a 
preferred embodiment of the present invention, the acceptor-partial selectable 
marker is a promoter capable of driving the expression of the donor-partial 
10 selectable marker ORF from the donor construct (e.g., the promoter for the 
ampicillin gene from the plasmid pUC19) or a viral promoter including, but not 
limited to, the T7, T3, and Sp6 promoters. 

□ 

o In yet another preferred embodiment of the acceptor construct, the 

m 

iVi acceptor construct additionally includes a DNA sequence encoding a peptide 

15 affinity domain or peptide tag sequence, wherein the affinity domain or tag 
41 sequence is 3' of the first promoter and 5' of the acceptor sequence-specific 

JLj recombinase target site, such that the expression of the affinity domain or tag 

H= sequence is under control of the first promoter, and such that it is in the same 

y translational frame as the acceptor sequence-specific recombinase target site. 

:~{ 20 The present invention is not limited by the nature of the affinity domain or tag 
sequence employed; a variety of suitable affinity domains are known in the art, 
including glutathione-S-transferase, the maltose binding protein, protein A, protein 
L, polyhistidine tracts, etc.; and tag sequences include, but are not limited to the 
c-Myc Tag, the HA Tag, the FLAG tag, Green Fluorescent Protein (GFP), etc. 
25 In another preferred embodiment of the acceptor vector construct, the 

acceptor construct additionally includes a DNA sequence encoding a peptide 
affinity domain or peptide tag sequence, wherein the affinity domain or tag 
sequence is 3' of an intron splice acceptor sequence placed in the acceptor 
vector 3' of the partial selectable marker, such that when this vector is 
30 recombined with a donor vector of the invention having an appropriately 

positioned intron splice donor sequence, an expression cassette is generated 
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having a functional synthetic intron and in which the expression of the affinity 
domain or tag sequence is under control of the first promoter of the acceptor 
vector, and such that it is in the same translational frame as a gene of interest 
placed within the donor vector. The present invention is not limited by the nature 

5 of the affinity domain or tag sequence employed; a variety of suitable affinity 
domains are known in the art, including glutathione-S-transferase, the maltose 
binding protein, protein A, protein L, polyhistidine tracts, etc.; and tag sequences 
include, but are not limited to the c-Myc Tag, the HA Tag, the FLAG tag, Green 
Fluorescent Protein (GFP), etc. Since this tag and the gene of interest are in- 

10 frame, following splicing, they will be expressed as a single fusion protein, with 
the Tag being at the C-terminus of the protein. 

In another preferred embodiment of the acceptor construct, the acceptor 
construct further includes an acceptor-functional selectable marker. The present 
invention is not limited by the nature of the acceptor-functional selectable marker 



15 chosen and the selectable marker gene may result in positive or negative 



selection. In a preferred embodiment, the acceptor-functional selectable marker 
gene is selected from the group consisting of the chloramphenicol resistance 
gene, the ampicillin resistance gene, the tetracycline resistance gene, the 
kanamycin resistance gene, the streptomycin resistance gene and the sacB 
20 gene. 

In addition to one or more of the above described components, the 
acceptor vectors may include a number of additional components that impart 
specific function to the expression vectors that are produced from the acceptor 
vector according to the subject methods. Additional elements that may be present 

25 on the subject acceptor vectors include, but are not limited to: (a) elements 
requisite for generating vectors suitable for use in yeast two hybrid expression 
assays, e.g., a GAL4 activation domain coding sequence, a GAL4 DNA-binding 
domain coding sequence, (as found in pLP-GADT7 and pLP-GBKT7 shown in 
Figs. 3A & 3B); (b) elements necessary for study of the localization of a protein in 

30 a cell, e.g., tagging elements such as fluorescent protein coding sequences, such 
as the GFP coding sequences; (c) elements necessary for constitutive, bicistronic 
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expression in mammalian cells, e.g., IRES sites, in combination with selectable 
markers, e.g. antibiotic resistance, fluorescent protein, etc. ; (d) elements 
necessary for inducible expression of the gene of interest on an expression 
vector, e.g. inducible promoters such as the tet-responsive promoter, etc.; (e) 
5 elements that provide for retroviral expression vectors; and the like. 

In addition to the above requisite and optional elements, the acceptor 
vectors further include at least one splice site. Two representative but non-limiting 
embodiments are now described further. In a first embodiment, the acceptor 
vector includes a splice acceptor site positioned 3' of the single sequence specific 
10 recombinase site of the vector. More precisely, this splice acceptor sequence is 
placed 3' of the acceptor partial selectable marker sequence. This embodiment 
;*j finds use in applications where one wishes to produce expression vectors in 

Q which the gene of interest is not expressed as a fusion with 3' sequence specific 

m 

m recombinase site encoded domains, etc. In a second respresentative 

^ 15 embodiment, the acceptor vector further includes a splice donor site which is 

41 positioned 5' of the single sequence specific recombinase site, where this 

^ embodiment finds use in those situations where one wishes to produce an 

expression vector in which the gene of interest is expressed as a protein that 

\j does not include either N or C-terminal residues encoded by the 5' and 3' 

20 sequence specific recombinase sites. 

Product Vector Generation with a Recombinase 

As mentioned above, in the subject methods the donor and acceptor 
25 vectors are contacted with a recombinase under conditions sufficient for site 
specific recombination to occur, specifically under conditions sufficient for a 
recombinase mediated recombination event to occur that produces the desired 
intron containing product vector, where product vector production is accomplished 
without cutting or ligation of the donor and acceptor vectors with restriction 
30 endonucleases and nucleic acid ligases. The contact may occur under in vitro or 
in vivo conditions, as is desired and/or convenient. 
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In many embodiments, an aqueous reaction mixture is produced by 
combining the donor and acceptor vectors and the recombinase with water and 
other requisite and/or desired components to produce a reaction mixture that, 
under appropriate conditions, results in production of the desired expression 
5 vector. The various components may be combined separately or simultaneously, 
depending on the nature of the particular component and how the components 
are combined. Conveniently, the components of the reaction mixture are 
combined in a suitable container. The amount of donor and acceptor vectors that 
are present in the reaction mixture are sufficient to provide for the desired 
10 production of the expression vector product, where the amounts of donor and 
acceptor vector may be the same or different, but are in many embodiments 
substantially the same if not the same. In many embodiments, the amount of 
donor and acceptor vector that is present in the reaction mixture ranges from 
about 50 ng to 2 jig, usually from about 100 ng to 500 ng and more usually from 
15 about 150 ng to 300 ng, for a reaction volume ranging from about 5 (il to 1000 jil, 
usually from about 10 (il to 50 \il 

The recombinase that is present in the reaction mixture is one that 
provides for recombination of the donor and acceptor vectors, i.e. one that 
recognizes the recombinase recognition sites on the donor and acceptor vectors. 



in 



M 



SI 

Ci 

iij 20 As such, the recombinase employed will vary, where representative 



recombinases include, but are not limited to: recombinases, transposes and 
integrases, where specific recombinases of interest include, but are not limited to: 
Cre recombinase (the ere gene has been cloned and expressed in a variety of 
hosts, and the enzyme can be purified to homogeneity using standard techniques 

25 known in the art- purified Cre protein is available commercially from CLONTECH, 
Novagen, NEB, and others); FLP recombinase of S. cerevisiae that recognizes 
the frt site; Int recombinase of bacteriophage Lambda that recognizes the att site; 
xerC and xerD recombinases of E.co//, which together form a recombinase that 
recognizes the dif site, the Int protein from the Tn916 transposon; the Tn3 

30 resolvase, the Hin recombinase; the Cin recombinase; the immunoglobulin 
recombinases; and the like. While the amount of recombinase present in the 
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reaction mixture may vary depending on the particular recombinase employed, in 
many embodiments the amount ranges from about 0.1 units to 1250 units, usually 
from about 1 unit to 10 units and more usually from about 1 unit to 2 units, for the 
above described reaction volumes. The aqueous reaction mixture may include 
5 additional components, e.g., a reaction buffer or components thereof, e.g., 

buffering compounds, such as Tris-HCI; MES; sodium phosphate buffer, sodium 
acetate buffer; and the like, which are often present in amounts ranging from 
about 10 mM to 100 mM, usually from about 20 mM to 50 mM; monovalent ions, 
e.g., sodium, chloride, and the like, which are typically present in amounts 

10 ranging from about 10 mM to 500 mM, usually from about 30 mM to 150 mM; 
divalent cations, e.g., magnesium, calcium and the like, which are often present in 
amounts ranging from about 1 mM to 20 mM, usually from about 5 mM to 10 mM; 
and other components, e.g., BSA, EDTA, spermidine and the like; etc (where the 
above amount ranges are provided for the representative reaction volumes 

15 described above). As the reaction mixtures are aqueous reaction mixtures, they 
also include water. 

The subject reaction mixtures are typically prepared at temperatures 
ranging from about 0-4°C, e.g., on ice, to minimize enzyme activity. Following 
reaction mixture preparation, the temperature of the reaction mixture is typically 

20 raised to a temperature that provides for optimum or maximal recombinase 

activity, and concomitantly expression vector production. Often, in this portion of 
the method the temperature will be raised to a temperature ranging from about 4 
°C to 37 °C, usually from about 10 °C to 25 °C , where the mixture will be 
maintained at this temperature for a period of time sufficient for the desired 

25 amount of expression vector production to occur, e.g., for a period of time ranging 
from about 5 mins to 60 mins, usually from about 10 mins to 15 mins. Following 
the incubation period, the reaction mixture is subjected to conditions sufficient to 
inactivate the recombinase, e.g., the temperature of the reaction mixture may be 
raised to a value ranging from about 65 °C to 70 °C for a period of time ranging 

30 from about 5 mins to 10 mins. 
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Alternatively, contact of the donor and acceptor vectors with the 
recombinase may occur in vivo, where the donor and acceptor vectors are 
introduced in a suitable host cell that expresses a recombinase. In this 
embodiment, the recombination between the donor and acceptor vectors may be 
5 accomplished in vivo using a host cell that transiently or constitutively expresses 
the appropriate site-specific recombinase (e.g., Cre recombinase expressed in 
the bacterial strain BNN132, available from CLONTECH). pDonorand pAcceptor, 
i.e., the donor and acceptor vectors respectively, are co-transformed into the host 
cell using a variety of methods known in the art (e.g., transformation of cells made 
10 competent by treatment with CaCI 2 , electroporation, etc.). The co-transformed 
host cells are grown under conditions which select for the presence of the 
recombinant-functional selectable marker created by recombination of pDonor 
p with the pAcceptor (e.g., growth in the presence of chloramphenicol and sucrose 

J|] when the pDonor vector contains the SacB negative selection marker on the non 

15 donor fragment and all or part of the chloramphenicol resistance gene open 
f : reading frame and pAcceptor may also contain a promoter necessary for 

expression of the chloramphenicol open frame). Plasmid DNA is isolated from 
u host cells which grow in the presence of the selective pressure and is subjected 

H to restriction enzyme digestion to confirm that the desired recombination event 

□ 20 has occurred. 

: s j 

The present invention also provides a method for the in vitro recombination 
of nucleic acid constructs, comprising the steps of: a) providing i) a donor nucleic 
acid construct comprising a donor-partial selectable marker, two donor sequence- 
specific recombinase target sites each having a defined 5' to 3' orientation and 

25 wherein the donor sequence-specific recombinase target sites are placed in the 
donor construct such that they have the same 5' to 3' orientation, and a unique 
restriction enzyme site or polylinker, the restriction enzyme site or polylinker being 
located 3' of the first-donor sequence-specific recombinase target site and 5' of 
the second-donor sequence-specific recombinase target site; (ii) an acceptor 

30 nucleic acid construct comprising an origin of replication, an acceptor sequence- 
specific recombinase target site having a defined 5' to 3' orientation, a first 
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promoter adjacent to the 5 1 end of the acceptor sequence-specific recombinase 
target site, and an acceptor-partial selectable marker, wherein the acceptor-partial 
selectable marker is capable of recombining with the donor-partial selectable 
marker from the donor construct to create a recombinant-functional selectable 
5 marker in a final desired recombination construct; b) contacting the donor and 
acceptor constructs in vitro with a site-specific recombinase under conditions 
such that the desired donor fragment recombines with the acceptor construct to 
form a final desired recombination construct. 

The present invention further provides a method for the recombination of 
10 nucleic acid constructs in a host, comprising the steps of: a) providing i) a donor 
nucleic acid construct comprising a donor-partial selectable marker, two donor 

H= sequence-specific recombinase target sites each having a defined 5' to 3' 

□ 

cj orientation and wherein the donor sequence-specific recombinase target sites are 

lil placed in the donor construct such that they have the same 5' to 3* orientation, 

%j 15 and a unique restriction enzyme site or polylinker, the restriction enzyme site or 
"% polylinker located 3' of the first-donor sequence-specific recombinase target site 

* and 5' of the second-donor sequence-specific recombinase target site; (ii) an 

U acceptor nucleic acid construct comprising an origin of replication, an acceptor 

H sequence-specific recombinase target site having a defined 5' to 3' orientation, a 

□ 20 first promoter adjacent to the 5* end of the acceptor sequence-specific 

recombinase target site, and an acceptor-partial selectable marker, wherein the 
acceptor-partial selectable marker is capable of recombining with the donor- 
partial selectable marker from the donor to create a recombinant-functional 
selectable marker in a final desired recombination construct; and iii) a host cell 
25 expressing a site-specific recombinase; b) introducing the donor and acceptor 
constructs into the host cell under conditions such that the desired donor 
fragment recombines with the acceptor construct to form the final desired 
recombination construct which is capable of imparting the ability to the host cell to 
grow in selective growth medium. 
30 The above methods of producing expression vectors can be employed to 

rapidly produce a plurality of different expression vectors that are distinct from 
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each other but carry the same coding sequence of interest from a single, original 
type of donor vector. In other words, the subject methods can be used to rapidly 
clone a nucleic acid of interest from an initial vector into a plurality of expression 
vectors. By plurality is meant at least 2, usually at least 5, and more usually at 
5 least 10, where the number may be as high as 20, 96 or more. The methods can 
be performed by one person in a period of time that is a fraction of what it would 
take by that person of skill in the art to produce the same number and variety of 
expression vectors using traditional cutting and ligation protocols, where the 
increase in efficiency obtained by the subject methods is at least about 6 fold, 
10 usually at least about 15 fold and more usually at least about 30 fold. 

Wa The Resultant Product Vector 

a 

m The above steps result in the production of an intron containing product 

tj 15 vector (i.e. a vector that includes one or more, e.g., one or two, spliceable introns) 
! i] from donor and acceptor vectors, and in certain embodiments from a portion of 

one of these vectors and the entirety of the other of these vectors, e.g., from a 
portion of the donor vector and the entirety of the acceptor vector, where by 
portion is meant the part of the donor vector that lies 3' of the first donor 
20 sequence-specific recombinase site and 5' of the second donor sequence- 
specific recombinase site. The size of the product vector may vary, depending on 
the nature of the vector. Where the vector is a plasmid, the size of the expression 
vector may range from about 3 kb to 20 kb, usually from about 4 kb to 8 kb. 

The resultant product vector in many embodiments is characterized in that 
25 it includes two recombinase recognition sites, i.e., a first and second recombinase 
recognition site, oriented in the same direction. The distance between the first 
and second recombinase sites, specifically the distance between the 3' end of the 
first recombinase site and the 5' end of the second recombinase site, ranges in 
many embodiments from about 100 bp to 100 kb, usually from about 500 bp to 20 
30 kb, depending on whether the coding sequence of a protein of interest or just a 
restriction site/multiple cloning site, is present between the first and second 
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recombinase recognition sites. The portion of the vector that lies in this inter 
recombinase region, i.e. 3' of the first recombinase site and 5' of the second 
recombinase site, typically makes up from about 2 % to 85%, usually from about 
20% to 60 % of the entire expression vector. 
5 In many embodiments, the expression vector is further characterized in 

that 5' of the first recombinase site is a first promoter, 3' of the first recombinase 
site is at least one restriction site; and the second recombinase site located inside 
a functional selectable marker, i.e., it is flanked by disparate portions or sub-parts 
of a selectable marker expression module or cassette (e.g., a promoter and a 
10 coding sequence), where the second recombinase site is present between the 
two sub-parts of the selectable marker in a manner such that the" selectable 
marker is functional, i.e., the coding sequence of the selectable marker is 
expressed. In other words the expression vector includes a selectable marker 
expression cassette or module made up of a promoter and coding sequence that 
[11 15 flank the second recombinase site. In many embodiments, the second 
:|j recombinase site is flanked by a promoter on its 3' end and a coding sequence of 

:: ^ = the selectable marker on its 5' end. In this embodiment, the first and second 

is 

Q promoters, located 5' of the first recombinase site and 3' of the second 

r recombinase site, respectively, are oriented in opposite directions. 

20 The expression vector is further characterized by having at least one 

ny restriction site, and generally a multiple cloning site, located between the first and 

second recombinase sites. In many embodiments, located between the first and 
second recombinase sites, and flanked by two restriction sites, which may or may 
not be the same, is a nucleic acid of interest, i.e., gene of interest, that includes a 
25 coding sequence for a protein of interest whose expression from the expression 
vector is desired. In these embodiments, the first promoter 5' of the first 
recombinase site and the coding sequence for the protein of interest are arranged 
on either side of the first recombinase site such that they form an expression 
module or cassette that expresses the encoded protein, i.e., the coding sequence 
30 and first promoter flank the first recombinase site in manner such that they are 
operably linked. 
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In addition to the above features, the expression vector further includes at 
least one origin of replication that provides for replication in the host or hosts into 
which it is placed or transformed during use. Origins of replication of interest 
include, but are not limited to, those described above in connection with the donor 
5 and acceptor vectors. 

In certain embodiments, the product vector contains a gene or DNA 
sequence of interest inserted into the unique restriction enzyme site or polylinker 
such that the gene or DNA sequence of interest is under the control of the first 
promoter. The gene or DNA sequence of interest is joined to the 3' end of the 
10 first-recombinant sequence-specific recombinase target site such that a functional 
transcriptional unit is formed so that the gene or DNA sequence of interest is 
u, expressed as a protein driven by the first promoter of the acceptor construct. In a 

Q more preferred embodiment, the gene of interest is joined to the 3' end of the 

□ 

Ufl first-recombinant sequence-specific recombinase target site such that a functional 

!*! 15 translational reading frame is created wherein the gene or DNA sequence of 
£j interest is expressed as a fusion protein with an affinity domain or tag sequence 

derived from the acceptor plasmid and under the expression control of the first 
promoter of the acceptor construct. 
In another preferred embodiment, the gene of interest is joined to the donor splice 
20 site such that when the intron is spliced out of the resultant mRNA, the gene of 
interest is fused in frame to a C-terminal tag derived from the acceptor vector. 

In certain embodiments, the product vector further comprises an acceptor- 
functional selectable marker gene derived from the acceptor construct. If an 
acceptor-functional selectable marker gene is present in addition to the newly- 
25 created recombinant-functional selectable marker, the acceptor-functional 
selectable marker is a different selectable marker from the newly-created 
recombinant-functional selectable marker. The present invention should not be 
limited by the nature of the selectable marker genes chosen; the marker genes 
may result in positive or negative selection and may be chosen from the group 
30 including, but not limited to, the chloramphenicol resistance gene, the ampicillin 



a 
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resistance gene, the tetracycline resistance gene, the kanamycin resistance 
gene, the streptomycin resistance gene, the strA gene and the sacB gene. 

In addition to the above features, the product vector further includes at 
least one, and typically one to two, spliceable introns. The one or more introns 
may be positioned anywhere in the product vector. In certain representative 
embodiments, the 3 f recombinase recognized site is present in an intron. In other 
representative embodiments, the 5* recombinase recognized site is present in an 
intron. In yet other representative embodiments, both the 5' and 3' recombinase 
recognized sites are present in introns. 



Utility 



ill The subject methods find use in a variety of different applications, where 

£} 15 such applications are generally those protocols and methods in which the transfer 
^ of a nucleic acid of interest from one vector to another, e.g., the cloning of a 

nucleic acid from an initial vector into a final vector, is desired. As such, the 
7 s : subject methods are particularly suited for use in cloning nucleic acids of interest, 

pa 

H including whole libraries, from an initial vector into an expression vector, where 

rj 20 the product vector may be functionalized to express the polypeptide or protein 



encoded by the nucleic acid of interest located on it in a variety of different 
desired environments and/or under desired conditions, e.g., in a cell of interest, in 
response to a particular stimulus, tagged by a detectable marker, etc. 

As such, the product vectors produced by the subject methods find use in 

25 a variety of different applications, including the study of polypeptide and protein 
function and behavior, i.e., in the characterization of a polypeptide or protein, 
either known or unknown; and the like. In the broadest sense, the subject 
methods find application in any method where traditional digestion and ligation 
protocols are employed to transfer or clone a nucleic acid from one vector to 

30 another, e.g., cloning digestion and ligation protocols, where the expression 
vectors produced by the subject methods find use in research applications, as 
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well as other applications, e.g., protein production applications, therapeutic 
applications, and the like. 

Depending on the location of the one or more introns in the product 
vectors, the product vectors find use in the expression of non-fusion proteins, 
5 e.g., proteins free of residues at their N- and C-termini that are encoded by 
sequence specific recombinase sites; N-and or C-termini tagged proteins, etc. 

Systems 

10 Also provided are systems for use in practicing the subject methods. The 

subject systems at least include a donor vector and an acceptor vector as 
described above. In addition, the subject systems may include a recombinase 

y which recognizes the recombinase sites present on the donor and acceptor 

y 

IJ1 vectors. The systems may also include, where desired, a host cell, e.g., in in vivo 

If! 

Z\ 15 methods of expression vector production, as described above. Other components 
f {) of the subject systems include, but are not limited to: reaction buffer, controls, etc. 

s«jss 
■E 

a 

h= Libraries 

w 

ti 20 

! y Also provided are nucleic acid libraries cloned into donor and/or acceptor 

vectors of the subject invention. These nucleic acid libraries are made up of a 
plurality of individual donor/acceptor vectors where each distinct constituent 
member of the library has a different nucleic acid portion or component, e.g., 

25 genomic fragment, cDNA, of an original whole nucleic acid library, i.e., 

fragmented genome, cDNA collection generated from the total or partial mRNA of 
an mRNA sample, etc. In other words, the libraries of the subject invention are 
nucleic acid libraries cloned into donor or acceptor vectors according to the 
subject invention, where the nucleic acid libraries include, but are not limited to, 

30 genomic libraries, cDNA libraries, etc. Specific donor/acceptor libraries of interest 
include, but are not limited to: Human Brain Poly A+ RNA; Human Heart Poly A+ 
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RNA; Human Kidney Poly A+ RNA; Human Liver Poly A+ RNA; Human Lung Poly 
A+ RNA; Human Pancreas Poly A+ RNA; Human Placenta Poly A+ RNA; Human 
Skeletal Muscle Poly A+ RNA; Human Testis Poly A+ RNA; Human Prostate Poly 
A+ RNA and the like. With donor libraries according to the subject invention, the 
5 subject methods permit the rapid exchange of either individual clones of interest, 
groups of clones or potentially an entire cDNA library to a variety of expression 
vectors. 

Kits 

10 

Also provided are kits for use in practicing the subject methods. The 
u subject kits at least include at least one donor vector and a recombinase that 

2 recognizes the recombinase sites of the donor vector. The subject kits may 

111 further include other components that find use in the subject methods, e.g., 

15 acceptor vectors; reaction buffers, positive controls, negative controls, etc. 

"•i 

In addition to the above components, the subject kits will further include 
instructions for practicing the subject methods. These instructions may be present 
H in the subject kits in a variety of forms, one or more of which may be present in 

H= the kit. One form in which these instructions may be present is as printed 

q 20 information on a suitable medium or substrate, e.g., a piece or pieces of paper oh 
which the information is printed, in the packaging of the kit, in a package insert, 
etc. Yet another means would be a computer readable medium, e.g., diskette, 
CD, etc., on which the information has been recorded. Yet another means that 
may be present is a website address which may be used via the internet to 
25 access the information at a removed site. Any convenient means may be present 
in the kits. 



30 The following examples are offered by way of illustration and not by way of 

limitation. 

B, F & F Ref: CLON-069 
Clontech Ref: P-90 

F:\DOCUMENT\CLON\069\patent application.doc 



EXPERIMENTAL 



Example 1 . Representative Protocols 

5 

A. 

Figure 5 provides a flow diagram of a representative recombinase based 
method according to the subject invention. 

10 B. 

In order to test the utility of intron-splicing to enable tagging of a protein of 
interest in a donor vector with a peptide tag or protein in an acceptor vector, a 
□ Donor and Acceptor vector capable of splicing were built using standard 

yn molecular biology techniques. The Donor vector was called pDNR-Dual. A map 

;f ] 15 of this vector is provided in Figure 1 and its sequence is provided below as SEQ 
yj ID NO:01. The Acceptor vector was called pLPS-EGFP. A map of this vector is 

J' provided in Figure 2 and its sequence is provided below as SEQ ID NO:02. 

Further, a luciferase test gene was cloned, using standard techniques into the 
H MCS of pDNR-Dual at the Sail and Apa I sites, so as to generate pDNR-Dual- 

{ «j 20 Luc. A map of this vector is provided in Figure 3 and the sequence of this vector 
Ri is provided below as SEQ ID NO:03. In so doing, the Luciferase gene was placed 

such that it had no stop codon and such that it would be in-frame with the EGFP 
tag present in pLPS-EGFP following Cre/Lox-based transfer from the Donor to the 
Acceptor. 

25 The pDNR-Dual-Luc and pLPS-EGFP vectors were then recombined in 

vitro using Cre according to methods described in Clontech's Creator User 
Manual (Clontech Laboratories Inc., Palo Alto CA) (see also the methods 
disclosed in U.S. Application Serial No. 09/616,651, the disclosure of which is 
herein incorporated by reference), and an aliquot of the reaction was transformed 

30 in to competent E. coli. Following selection on chloramphenicol and sucrose 
plates, recombinant clones were isolated and confirmed by standard restriction 
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mapping and sequencing to encode the expectedrecombinant molecule, having 
the luciferase gene from the donor vector transferred to the acceptor vector. This 
vector is called pLPS-Luc-EGFP. A map of this vector is provide in Figure 4 and 
the sequence of this vector is provided below as SEQ ID NO:04. This construct 
5 thus has both a splice donor sequence, provided from the donor vector, and a 
splice acceptor sequence, provided by the acceptor vector. Together, these 
create an artificial intron between the 3'end of the luciferase gene and the 5' end 
of the EGFP Tag. This intron being composed of the chloramphenicol open 
reading frame, the second LoxP site, and the ampicillin promoter sequence. 
10 To test if this construct would generate a properly spliced mRNA, so 

enabling expression of a luciferase EGFP fusion protein, the pLPS-Luc-EGFP 
vector was then transfected into HEK293 cells using standard procedures known 

□ to the art. For comparison, the HEK293 cells were also transfected with a pLuc- 

EGFP construct. This construct was made by cloning the luciferase gene (without 

iJl 15 stop codon) in-frame with EGFP into the pEGFP-N1 vector (available from 

,.n Clontech Laboraries, Inc. Palo Alto CA) using standard molecular biology 

^ techniques. 

q Twenty-four hours after transfection, the cells were examined for EGFP 

H fluorescence using a fluorescence microscope. Both the splicing construct 

20 (pLPS-Luc-EGFP) and the direct luciferase-EGFP fusion (pLuc-EGFP) showed 

.SS.X 

I J 

ry equivalent EGFP expression over untransfected control cells. 

Extracts of the cells were then made and analyzed by western blotting using an 
anti-luciferase antibody. Again, both the splicing construct (pLPS-Luc-EGFP) 
and the direct luciferase-EGFP fusion (pLuc-EGFP) showed equivalent 
25 expression of the luciferase-EGFP fusion protein. A further analysis of total RNA 
extracted from cells transfected with the splicing construct (pLPS-Luc-EGFP) by 
Northern blotting, demonstrated that the mRNA generated from the construct was 
being efficiently spliced to remove the chloramphenicol sequences. 

30 



B, F & F Ref: CLON-069 
Clontech Ref: P-90 

F:\DOCUMENT\CLON\069\patentapplication.doc 

45 



Example 2. Vector Sequence Information 



A. pDNR-dual 





5 


l 


gcggccgcat 


aacttcgtat 






61 


atatgcccgg 


gaattcctgc 






121 


cgcgggccca 


ggtgagtggt 






181 


ctaggagatc 


ctggtcatga 




10 


241 


aaccgagcgt 


tctgaacaaa 




301 


ggagtccaag 


cgagctcgat 






361 


tgtaattcat 


taagcattct 






421 


atcgccagcg 


gcatcagcac 






481 


ggggcgaaga 


agttgtccat 




15 


541 


ggattggctg 


agacgaaaaa 




601 


tcaccgtaac 


acgccacatc 






661 


tattcactcc 


agagcgatga 






721 


tgaacactat 


cccatatcac 






781 


gcattcatca 


ggcgggcaag 




20 


841 


tttacggtct 


ttaaaaaggc 


r^: 


901 


gcaactgact 


gaaatgcctc 






961 


gtatatccag 


tgattttttt 


'■z-zi 




1021 


tcgtatagca 


tacattatac 


: s? : 
j jfl 




1081 


actattattt 


agtgaaatga 


25 


1141 


tttatgccca 


tgcaacagaa 


Hi 


1201 


ttttagttct 


ttaggcccgt 






1261 


ggaaaataga 


ccagttgcaa 






1321 


cgcgcgggtt 


tgttactgat 






1381 


tactttggcg 


tcacccctta 


Cj 

Issa 


30 


1441 


catcttcaaa 


caggagggct 


1501 


catgaacgat 


gaacatcaaa 


L-i. 




1561 


cactgctggc 


aggaggcgca 






1621 


aggaaacata 


cggcatttcc 






1681 


aaaaaaatga 


aaaatatcaa 




35 


1741 


ctgcaaaagg 


cctggacgtt 


sea s 
i :: 
i = 


1801 


caaactatca 


cggctaccac 






1861 


acacatcgat 


ttacatgttc 






1921 


acgctggccg 


cgtctttaaa 






1981 


accaaacaca 


agaatggtca 




40 


2041 


tctacactga 


tttctccggt 




2101 


acgtatcagc 


atcagacagc 






2161 


ttgacggtga 


cggaaaaacg 






2221 


gctcaggcga 


caaccatacg 






2281 


acttagtatt 


tgaagcaaac 




45 


2341 


ttaacaaagc 


atactatggc 




2401 


tgcaaagcga 


taaaaaacgc 






2461 


taaacgatga 


ttacacactg 






2521 


cagatgaaat 


tgaacgcgcg 






2581 


actcccgcgg 


atcaaaaatg 




50 


2641 


gttatgtttc 


taattcttta 




2701 


taaaaatgga 


tcttgatcct 






2761 


aagcgaaagg 


aaacaatgtc 






2821 


acaaacaatc 


aacgtttgcg 






2881 


ttgtcaaaga 


cagcatcctt 



agcatacatt atacgaagtt atcagtcgac ggtaccggac 
aggatccgct cgagaagctt tctagaccat tcgtttggcg 
cataatcata atcataatca taatcataat cacaactagc 
ctagtgcttg gattctcacc aataaaaaac gcccggcggc 
tccagatgga gttctgaggt cattactgga tctatcaaca 
atcaaattac gccccgccct gccactcatc gcagtactgt 
gccgacatgg aagccatcac aaacggcatg atgaacctga 
cttgtcgcct tgcgtataat atttgcccat ggtgaaaacg 
attggccacg tttaaatcaa aactggtgaa actcacccag 
catattctca ataaaccctt tagggaaata ggccaggttt 
ttgcgaatat atgtgtagaa actgccggaa atcgtcgtgg 
aaacgtttca gtttgctcat ggaaaacggt gtaacaaggg 
cagctcaccg tctttcattg ccatacgaaa ttccggatga 
aatgtgaata aaggccggat aaaacttgtg cttatttttc 
cgtaatatcc agctgaacgg tctggttata ggtacattga 
aaaatgttct ttacgatgcc attgggatat atcaacggtg 
ctccatttta gcttccttag ctcctgaaag atccataact 
gaagttatgc ggccgcgacg tccacatata cctgccgttc 
gatattatga tattttctga attgtgatta aaaaggcaac 
actataaaaa atacagagaa tgaaaagaaa cagatagatt 
agtctgcaaa tccttttatg attttctatc aaacaaaaga 
tccaaacgag agtctaatag aatgaggtcg aaaagtaaat 
aaagcaggca agacctaaaa tgtgtaaagg gcaaagtgta 
catattttag gtcttttttt attgtgcgta actaacttgc 
ggaagaagca gaccgctaac acagtacata aaaaaggaga 
aagtttgcaa aacaagcaac agtattaacc tttactaccg 
actcaagcgt ttgcgaaaga aacgaaccaa aagccatata 
catattacac gccatgatat gctgcaaatc cctgaacagc 
gttcctgagt tcgattcgtc cacaattaaa aatatctctt 
tgggacagct ggccattaca aaacgctgac ggcactgtcg 
atcgtctttg cattagccgg agatcctaaa aatgcggatg 
tatcaaaaag tcggcgaaac ttctattgac agctggaaaa 
gacagcgaca aattcgatgc aaatgattct atcctaaaag 
ggttcagcca catttacatc tgacggaaaa atccgtttat 
aaacattacg gcaaacaaac actgacaact gcacaagtta 
tctttgaaca tcaacggtgt agaggattat aaatcaatct 
tatcaaaatg tacagcagtt catcgatgaa ggcaactaca 
ctgagagatc ctcactacgt agaagataaa ggccacaaat 
actggaactg aagatggcta ccaaggcgaa gaatctttat 
aaaagcacat cattcttccg tcaagaaagt caaaaacttc 
acggctgagt tagcaaacgg cgctctcggt atgattgagc 
aaaaaagtga tgaaaccgct gattgcatct aacacagtaa 
aacgtcttta aaatgaacgg caaatggtac ctgttcactg 
acgattgacg gcattacgtc taacgatatt tacatgcttg 
actggcccat acaagccgct gaacaaaact ggccttgtgt 
aacgatgtaa cctttactta ctcacacttc gctgtacctc 
gtgattacaa gctatatgac aaacagagga ttctacgcag 
cctagcttcc tgctgaacat caaaggcaag aaaacatctg 
gaacaaggac aattaacagt taacaaataa aaacgcaaaa 
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2 941 gaaaatgccg atatcctatt ggcattgacg tcaggtggca cttttcgggg aaatgtgcgc 
3001 ggaaccccta tttgtttatt tttctaaata cattcaaata tgtatccgct catgagacaa 

3 061 taaccctgat aaatgcttca ataatattga aaaaggaaga gtatgagtat tcaacatttc 
3121 cgtgtcgccc ttattccctt ttttgcggca ttttgccttc ctgtttttgc tcacccagaa 
3181 acgctggtga aagtaaaaga tgctgaagat cagttgggtg cacgagtggg ttacatcgaa 
3241 ctggatctca acagcggtaa gatccttgag agttttcgcc ccgaagaacg ttttccaatg 
3301 atgagcactt ttaaagttct gctatgtggc gcggtattat cccgtattga cgccgggcaa 
3361 gagcaactcg gtcgccgcat acactattct cagaatgact tggttgagta ctcaccagtc 
3421 acagaaaagc atcttacgga tggcatgaca gtaagagaat tatgcagtgc tgccataacc 
3481 atgagtgata acactgcggc caacttactt ctgacaacga tcggaggacc gaaggagcta 
3541 accgcttttt tgcacaacat gggggatcat gtaactcgcc ttgatcgttg ggaaccggag 
3.6 01 ctgaatgaag ccataccaaa cgacgagcgt gacaccacga tgcctgtagc aatggcaaca 
3661 acgttgcgca aactattaac tggcgaacta cttactctag cttcccggca acaattaata 

3 721 gactggatgg aggcggataa agttgcagga ccacttctgc gctcggccct tccggctggc 
3781 tggtttattg ctgataaatc tggagccggt gagcgtgggt ctcgcggtat cattgcagca 
3841 ctggggccag atggtaagcc ctcccgtatc gtagttatct acacgacggg gagtcaggca 
3901 actatggatg aacgaaatag acagatcgct gagataggtg cctcactgat taagcattgg 
3961 taactgtcag accaagttta ctcatatata ctttagattg atttaaaact tcatttttaa 
4021 tttaaaagga tctaggtgaa gatccttttt gataatctca tgaccaaaat cccttaacgt 
4081 gagttttcgt tccactgagc gtcagacccc gtagaaaaga tcaaaggatc ttcttgagat 
4141 cctttttttc tgcgcgtaat ctgctgcttg caaacaaaaa aaccaccgct accagcggtg 

42 01 gtttgtttgc cggatcaaga gctaccaact ctttttccga aggtaactgg cttcagcaga 
4261 gcgcagatac caaatactgt tcttctagtg tagccgtagt taggccacca cttcaagaac 
4321 tctgtagcac cgcctacata cctcgctctg ctaatcctgt taccagtggc tgctgccagt 

43 81 ggcgataagt cgtgtcttac cgggttggac tcaagacgat agttaccgga taaggcgcag 
4441 cggtcgggct gaacgggggg ttcgtgcaca cagcccagct tggagcgaac gacctacacc 
4501 gaactgagat acctacagcg tgagctatga gaaagcgcca cgcttcccga agggagaaag 
4561 gcggacaggt atccggtaag cggcagggtc ggaacaggag agcgcacgag ggagcttcca 
4621 gggggaaacg cctggtatct ttatagtcct gtcgggtttc gccacctctg acttgagcgt 
4681 cgatttttgt gatgctcgtc aggggggcgg agcctatgga aaaacgccag caacgcggcc 
4741 tttttacggt tcctggcctt ttgctggcct tttgctcaca tgttctttcc tgcgttatcc 
4801 cctgattctg tggataaccg tattaccgcc ttacgcgtgt aaaacgacgg ccagtagatc 
4861 tgtaatacga ctcactatag ggcgctagct gctcgccgca gccgaacgac cgagcgcagc 

4 921 gagtcagtga gcgaggaa (SEQ ID NO: 01) 



B. pLPS-EGFP 

1 tagttattaa tagtaatcaa ttacggggtc 

61 cgttacataa cttacggtaa atggcccgcc 

121 gacgtcaata atgacgtatg ttcccatagt 

181 atgggtggag tatttacggt aaactgccca 

241 aagtacgccc cctattgacg tcaatgacgg 

301 catgacctta tgggactttc ctacttggca 

361 catggtgatg cggttttggc agtacatcaa 

421 atttccaagt ctccacccca ttgacgtcaa 

481 ggactttcca aaatgtcgta acaactccgc 

541 acggtgggag gtctatataa gcagagctgg 

601 cttcgtatag catacattat acgaagttat 

661 gttattgtct catgagcgga tacatatttg 

721 ttccgcgcac atttccccga aaagtgccac 

781 ttcagggttt ccttgacaat atcatactta 

841 tcgcgagcaa gggcgaggag ctgttcaccg 

901 gcgacgtaaa cggccacaag ttcagcgtgt 

961 gcaagctgac cctgaagttc atctgcacca 

1021 tcgtgaccac cctgacctac ggcgtgcagt 



attagttcat agcccatata tggagttccg 
tggctgaccg cccaacgacc cccgcccatt 
aacgccaata gggactttcc attgacgtca 
cttggcagta catcaagtgt atcatatgcc 
taaatggccc gcctggcatt atgcccagta 
gtacatctac gtattagtca tcgctattac 
tgggcgtgga tagcggtttg actcacgggg 
tgggagtttg ttttggcacc aaaatcaacg 
cccattgacg caaatgggcg gtaggcgtgt 
tttagtgaac cgtcagatcc gctagcataa 
agatccaata ttattgaagc atttatcagg 
aatgtattta gaaaaataaa caaatagggg 
ctgacgtgga tctcgagctc aagcttcgaa 
tcctgtccct tttttttcca cagctaccgg 
gggtggtgcc catcctggtc gagctggacg 
ccggcgaggg cgagggcgat gccacctacg 
ccggcaagct gcccgtgccc tggcccaccc 
gcttcagccg ctaccccgac cacatgaagc 
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1081 


agcacgactt 


cttcaagtcc 






1141 


tcaaggacga 


cggcaactac 






1201 


tgaaccgcat 


cgagctgaag 






1261 


agctggagta 


caactacaac 




5 


1321 


gcatcaaggt 


gaacttcaag 






1381 


accactacca 


gcagaacacc 






1441 


acctgagcac 


ccagtccgcc 






1501 


tgctggagtt 


cgtgaccgcc 




10 


1561 


gcggccgcga 


ctctagatca 




1621 


aaaaaacctc 


ccacacctcc 






1681 


taacttgttt 


attgcagctt 






1741 


aaataaagca 


tttttttcac 






1801 


ttaaggcgta 


aattgtaagc 




15 


1861 


tcagctcatt 


ttttaaccaa 




1921 


agaccgagat 


agggttgagt 






1981 


tggactccaa 


cgtcaaaggg 






2041 


catcacccta 


atcaagtttt 






2101 


aagggagccc 


ccgatttaga 




20 


2161 


ggaagaaagc 


gaaaggagcg 




2221 


taaccaccac 


acccgccgcg 






2281 


ggggaaatgt 


gcgcggaacc 


Is:: a 




2341 


cgctcatgag 


acaataaccc 


: 1 




2401 


aggcggaaag 


aaccagctgt 


\ "' i 


25 


2461 


cccagcaggc 


agaagtatgc 


LP 


2521 


gtccccaggc 


tccccagcag 


ill 




2581 


catagtcccg 


cccctaactc 






2641 


tccgccccat 


ggctgactaa 






2701 


tgagctattc 


cagaagtagt 




30 


2761 


atcaagagac 


aggatgagga 




2821 


ctccggccgc 


ttgggtggag 


•s 




2881 


gctctgatgc 


cgccgtgttc 




2941 


ccgacctgtc 


cggtgccctg 


: . 

jsra 




3001 


ccacgacggg 


cgttccttgc 


hi 


35 


3061 


ggctgctatt 


gggcgaagtg 


\j 


3121 


agaaagtatc 


catcatggct 


pi 




3181 


gcccattcga 


ccaccaagcg 


S 




3241 


gtcttgtcga 


tcaggatgat 






3301 


tcgccaggct 


caaggcgagc 




40 


3361 


cctgcttgcc 


gaatatcatg 




3421 


ggctgggtgt 


ggcggaccgc 






3481 


agcttggcgg 


cgaatgggct 






3541 


cgcagcgcat 


cgccttctat 






3601 


cgaaatgacc 


gaccaagcga 




45 


3661 


cttctatgaa 


aggttgggct 




3721 


gcgcggggat 


ctcatgctgg 






3781 


gaaggagaca 


ataccggaag 






3841 


gcacggtgtt 


gggtcgtttg 






3901 


cgatacccca 


ccgagacccc 




50 


3961 


ccacccccca 


agttcgggtg 




4021 


tgccatagcc 


tcaggttact 






4081 


taaaaggatc 


taggtgaaga 






4141 


gttttcgttc 


cactgagcgt 






4201 


tttttttctg 


cgcgtaatct 




55 


4261 


ttgtttgccg 


gatcaagagc 




4321 


gcagatacca 


aatactgtcc 






4381 


tgtagcaccg 


cctacatacc 



gccatgcccg aaggctacgt ccaggagcgc accatcttct 
aagacccgcg ccgaggtgaa gttcgagggc gacaccctgg 
ggcatcgact tcaaggagga cggcaacatc ctggggcaca 
agccacaacg tctatatcat ggccgacaag cagaagaacg 
atccgccaca acatcgagga cggcagcgtg cagctcgccg 
cccatcggcg acggccccgt gctgctgccc gacaaccact 
ctgagcaaag accccaacga gaagcgcgat cacatggtcc 
gccgggatca ctctcggcat ggacgagctg tacaagtaaa 
taatcagcca taccacattt gtagaggttt tacttgcttt 
ccctgaacct gaaacataaa atgaatgcaa ttgttgttgt 
ataatggtta caaataaagc aatagcatca caaatttcac 
tgcattctag ttgtggtttg tccaaactca tcaatgtatc 
gttaatattt tgttaaaatt cgcgttaaat ttttgttaaa 
taggccgaaa tcggcaaaat cccttataaa tcaaaagaat 
gttgttccag tttggaacaa gagtccacta ttaaagaacg 
cgaaaaaccg tctatcaggg cgatggccca ctacgtgaac 
ttggggtcga ggtgccgtaa agcactaaat cggaacccta 
gcttgacggg gaaagccggc gaacgtggcg agaaaggaag 
ggcgctaggg cgctggcaag tgtagcggtc acgctgcgcg 
cttaatgcgc cgctacaggg cgcgtcaggt ggcacttttc 
cctatttgtt tatttttcta aatacattca aatatgtatc 
tgataaatgc ttcaataata ttgaaaaagg aagagtcctg 
ggaatgtgtg tcagttaggg tgtggaaagt ccccaggctc 
aaagcatgca tctcaattag tcagcaacca ggtgtggaaa 
gcagaagtat gcaaagcatg catctcaatt agtcagcaac 
cgcccatccc gcccctaact ccgcccagtt ccgcccattc 
ttttttttat ttatgcagag gccgaggccg cctcggcctc 
gaggaggctt ttttggaggc ctaggctttt gcaaagatcg 
tcgtttcgca tgattgaaca agatggattg cacgcaggtt 
aggctattcg gctatgactg ggcacaacag acaatcggct 
cggctgtcag cgcaggggcg cccggttctt tttgtcaaga 
aatgaactgc aagacgaggc agcgcggcta tcgtggctgg 
gcagctgtgc tcgacgttgt cactgaagcg ggaagggact 
ccggggcagg atctcctgtc atctcacctt gctcctgccg 
gatgcaatgc ggcggctgca tacgcttgat ccggctacct 
aaacatcgca tcgagcgagc acgtactcgg atggaagccg 
ctggacgaag agcatcaggg gctcgcgcca gccgaactgt 
atgcccgacg gcgaggatct cgtcgtgacc catggcgatg 
gtggaaaatg gccgcttttc tggattcatc gactgtggcc 
tatcaggaca tagcgttggc tacccgtgat attgctgaag 
gaccgcttcc tcgtgcttta cggtatcgcc gctcccgatt 
cgccttcttg acgagttctt ctgagcggga ctctggggtt 
cgcccaacct gccatcacga gatttcgatt ccaccgccgc 
tcggaatcgt tttccgggac gccggctgga tgatcctcca 
agttcttcgc ccaccctagg gggaggctaa ctgaaacacg 
gaacccgcgc tatgacggca ataaaaagac agaataaaac 
ttcataaacg cggggttcgg tcccagggct ggcactctgt 
attggggcca atacgcccgc gtttcttcct tttccccacc 
aaggcccagg gctcgcagcc aacgtcgggg 'cggcaggccc 
catatatact ttagattgat ttaaaacttc atttttaatt 
tcctttttga taatctcatg accaaaatcc cttaacgtga 
cagaccccgt agaaaagatc aaaggatctt cttgagatcc 
gctgcttgca aacaaaaaaa ccaccgctac cagcggtggt 
taccaactct ttttccgaag gtaactggct tcagcagagc 
ttctagtgta gccgtagtta ggccaccact tcaagaactc 
tcgctctgct aatcctgtta ccagtggctg ctgccagtgg 
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4441 cgataagtcg tgtcttaccg ggttggactc 

4501 gtcgggctga acggggggtt cgtgcacaca 

4561 actgagatac ctacagcgtg agctatgaga 

4621 ggacaggtat ccggtaagcg gcagggtcgg 

5 4681 gggaaacgcc tggtatcttt atagtcctgt 

4741 atttttgtga tgctcgtcag gggggcggag 

4801 tttacggttc ctggcctttt gctggccttt 

4861 tgattctgtg gataaccgta ttaccgccat 



aagacgatag ttaccggata aggcgcagcg 
gcccagcttg gagcgaacga cctacaccga 
aagcgccacg cttcccgaag ggagaaaggc 
aacaggagag cgcacgaggg agcttccagg 
cgggtttcgc cacctctgac ttgagcgtcg 
cctatggaaa aacgccagca acgcggcctt 
tgctcacatg ttctttcctg cgttatcccc 
gcat (SEQ ID NO: 02) 





10 


C. pDNR-Dual-Luc 








i 


gcggccgcat 


aacttegtat 






61 


aegecaaaaa 


cataaagaaa 






121 


gagagcaact 


gcataaggct 




15 


181 


cagatgeaca 


tatcgaggtg 




241 


tggcagaagc 


tatgaaacga 






301 


aaaactctct 


tcaattcttt 






361 


cgcccgcgaa 


cgacatttat 






421 


etacegtagt 


gtttgtttcc 




20 


481 


taccaataat 


tcagaaaatt 


less 

h 


541 


cgatgtacac 


gttegtcaca 






601 


cagagtcctt 


tgatcgtgac 


'•:-■} 




661 


ggttacctaa 


gggtgtggcc 






721 


gagatcctat 


ttttggcaat 


■ i?5 


25 


781 


tccatcacgg 


ttttggaatg 


H 


841 


tcttaatgta 


tagatttgaa 


•■£) 




901 


aaagtgcgtt 


gctagtacca 






961 


aatacgattt 


atctaattta 






1021 


teggggaage 


ggttgcaaaa 


a 

f\ 


30 


1081 


ctgagactac 


atcagctatt 




1141 


gtaaagttgt 


tccatttttt 




1201 


gcgttaatca 


gagaggegaa 






1261 


acaatcegga 


agcgaccaac 


\ 




1321 


tagcttactg 


ggacgaagac 


:..J 


35 


1381 


aatacaaagg 


atatcaggtg 


Rj 


1441 


acatcttcga 


c 9 c 999 c 9^tg 






1501 


ccgttgttgt 


tttggagcac 






1561 


ccagtcaagt 


aacaaccgcg 






1621 


cgaaaggtct 


caceggaaaa 




40 


1681 


agaagggegg 


aaaguccaaa 




1741 


ataatcataa 


ccacaa ccac 






1801 


tctcaccaat 


aaaaaacgee 






1861 


ctgaggtcat 


tactggatct 






1921 


ccgccctgcc 


actcatcgca 




45 


1981 


ccatcacaaa 


eggcatgatg 




2041 


gtataatatt 


tgcccatggt 






2101 


aaatcaaaac 


tggtgaaact 






2161 


aaccctttag 


ggaaataggc 






2221 


tgtagaaact 


geeggaaate 




50 


2281 


tgctcatgga 


aaacggtgta 




2341 


ttcattgeca 


tacgaaattc 






2401 


geeggataaa 


acttgtgctt 






2461 


tgaaeggtet 


ggttataggt 






2521 


egatgecatt 


gggatatatc 






2581 


tccttagctc 


ctgaaagatc 



agcatacatt atacgaagtt atcagtcgac accatggaag 
ggcccggcgc cattctatcc tctagaggat ggaaccgctg 
atgaagagat acgccctggt tcctggaaca attgetttta 
aacatcacgt aegeggaata cttcgaaatg tccgttcggt 
tatgggctga atacaaatca cagaategtc gtatgcagtg 
atgccggtgt tgggcgcgtt atttategga gttgcagttg 
aatgaacgtg aattgctcaa cagtatgaac atttegcage 
aaaaaggggt tgcaaaaaat tttgaacgtg caaaaaaaat 
attatcatgg attctaaaac ggattaccag ggatttcagt 
tctcatctac ctcccggttt taatgagtac gattttgtac 
aaaacaattg cactgataat gaattcctct ggatctactg 
cttccgcata gaactgcctg egtcagatte tcgcatgcca 
caaatcattc eggatactge gattttaagt gttgttccat 
tttactacac teggatattt gatatgtgga tttcgagtcg 
gaagagctgt ttttacgatc ccttcaggat tacaaaattc 
accctatttt cattcttcgc caaaagcact ctgattgaca 
cacgaaattg cttctggggg cgcacctctt tcgaaagaag 
cgcttccatc ttccagggat acgacaagga tatgggctca 
ctgattacac ccgaggggga tgataaaccg ggegeggteg 
gaagegaagg ttgtggatct ggataccggg aaaacgctgg 
ttatgtgtca gaggacctat gattatgtcc ggttatgtaa 
gecttgattg acaaggatgg atggctacat tctggagaca 
gaacacttct tcatagttga ccgcttgaag tctttaatta 
gcccccgctg aattggaatc gatattgtta caacacccca 
gcaggtcttc ccgacgatga cgccggtgaa cttcccgccg 
ggaaagacga tgacggaaaa agagategtg gattaegteg 
aaaaagttgc gcggaggagt tgtgtttgtg gacgaagtac 
ctcgacgcaa gaaaaatcag agagatcctc ataaaggeca 
ttgaggatcc gggcccaggt gagtggtcat aatcataatc 
aactagecta ggagatcctg gtcatgacta gtgcttggat 
cggcggcaac egagegttet gaacaaatcc agatggagtt 
atcaacagga gtccaagega gctcgatatc aaattacgee 
gtactgttgt aattcattaa gcattctgcc gacatggaag 
aacctgaatc gccagcggca tcagcacctt gtcgccttgc 
gaaaacgggg gcgaagaagt tgtccatatt ggccacgttt 
cacccaggga ttggctgaga cgaaaaacat attctcaata 
caggttttca ccgtaacacg ccacatcttg cgaatatatg 
gtcgtggtat tcactccaga gcgatgaaaa cgtttcagtt 
acaagggtga acactatccc atatcaccag ctcaccgtct 
eggatgagea ttcatcaggc gggcaagaat gtgaataaag 
atttttcttt aeggtcttta aaaaggccgt aatatccagc 
acattgagca actgactgaa atgcctcaaa atgttcttta 
aacggtggta tatccagtga tttttttctc cattttagct 
cataacttcg tatagcatac attatacgaa gttatgegge 
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2641 cgcgacgtcc acatatacct gccgttcact attatttagt gaaatgagat attatgatat 
2701 tttctgaatt gtgattaaaa aggcaacttt atgcccatgc aacagaaact ataaaaaata 
2761 cagagaatga aaagaaacag atagattttt tagttcttta ggcccgtagt ctgcaaatcc 
2821 ttttatgatt ttctatcaaa caaaagagga aaatagacca gttgcaatcc aaacgagagt 
5 2881 ctaatagaat gaggtcgaaa agtaaatcgc gcgggtttgt tactgataaa gcaggcaaga 

2 941 cctaaaatgt gtaaagggca aagtgtatac tttggcgtca ccccttacat attttaggtc 

3 001 tttttttatt gtgcgtaact aacttgccat cttcaaacag gagggctgga agaagcagac 
3 061 cgctaacaca gtacataaaa aaggagacat gaacgatgaa catcaaaaag tttgcaaaac 
3121 aagcaacagt attaaccttt actaccgcac tgctggcagg aggcgcaact caagcgtttg 

10 3181 cgaaagaaac gaaccaaaag ccatataagg aaacatacgg catttcccat attacacgcc 

3241 atgatatgct gcaaatccct gaacagcaaa aaaatgaaaa atatcaagtt cctgagttcg 
3301 attcgtccac aattaaaaat atctcttctg caaaaggcct ggacgtttgg gacagctggc 
3361 cattacaaaa cgctgacggc actgtcgcaa actatcacgg ctaccacatc gtctttgcat 
3421 tagccggaga tcctaaaaat gcggatgaca catcgattta catgttctat caaaaagtcg 
15 3481 gcgaaacttc tattgacagc tggaaaaacg ctggccgcgt ctttaaagac agcgacaaat 

3541 tcgatgcaaa tgattctatc ctaaaagacc aaacacaaga atggtcaggt tcagccacat 
3601 ttacatctga cggaaaaatc cgtttattct acactgattt ctccggtaaa cattacggca 
3661 aacaaacact gacaactgca caagttaacg tatcagcatc agacagctct ttgaacatca 
3 721 acggtgtaga ggattataaa tcaatctttg acggtgacgg aaaaacgtat caaaatgtac 
20 3 781 agcagttcat cgatgaaggc aactacagct caggcgacaa ccatacgctg agagatcctc 

3 841 actacgtaga agataaaggc cacaaatact tagtatttga agcaaacact ggaactgaag 
3 901 atggctacca aggcgaagaa tctttattta acaaagcata ctatggcaaa agcacatcat 

3 961 tcttccgtca agaaagtcaa aaacttctgc aaagcgataa aaaacgcacg gctgagttag 
Q 4021 caaacggcgc tctcggtatg attgagctaa acgatgatta cacactgaaa aaagtgatga 
!jj 25 4081 aaccgctgat tgcatctaac acagtaacag atgaaattga acgcgcgaac gtctttaaaa 
jjf] 4141 tgaacggcaa atggtacctg ttcactgact cccgcggatc aaaaatgacg attgacggca 
\\ 4201 ttacgtctaa cgatatttac atgcttggtt atgtttctaa ttctttaact ggcccataca 

4261 agccgctgaa caaaactggc cttgtgttaa aaatggatct tgatcctaac gatgtaacct 
?b 4321 ttacttactc acacttcgct gtacctcaag cgaaaggaaa caatgtcgtg attacaagct 

30 4381 atatgacaaa cagaggattc tacgcagaca aacaatcaac gtttgcgcct agcttcctgc 

4441 tgaacatcaa aggcaagaaa acatctgttg tcaaagacag catccttgaa caaggacaat 
4501 taacagttaa caaataaaaa cgcaaaagaa aatgccgata tcctattggc attgacgtca 

4 561 ggtggcactt ttcggggaaa tgtgcgcgga acccctattt gtttattttt ctaaatacat 
4621 tcaaatatgt atccgctcat gagacaataa ccctgataaa tgcttcaata atattgaaaa 

U 35 4681 aggaagagta tgagtattca acatttccgt gtcgccctta ttcccttttt tgcggcattt 

3 4 741 tgccttcctg tttttgctca cccagaaacg ctggtgaaag taaaagatgc tgaagatcag 

m 4 801 ttgggtgcac gagtgggtta catcgaactg gatctcaaca gcggtaagat ccttgagagt 

4 861 tttcgccccg aagaacgttt tccaatgatg agcactttta aagttctgct atgtggcgcg 
4921 gtattatccc gtattgacgc cgggcaagag caactcggtc gccgcataca ctattctcag 

40 4 981 aatgacttgg ttgagtactc accagtcaca gaaaagcatc ttacggatgg catgacagta 

5041 agagaattat gcagtgctgc cataaccatg agtgataaca ctgcggccaa cttacttctg 
5101 acaacgatcg gaggaccgaa ggagctaacc gcttttttgc acaacatggg ggatcatgta 
5161 actcgccttg atcgttggga accggagctg aatgaagcca taccaaacga cgagcgtgac 
5221 accacgatgc ctgtagcaat ggcaacaacg ttgcgcaaac tattaactgg cgaactactt 

45 5281 actctagctt cccggcaaca attaatagac tggatggagg cggataaagt tgcaggacca 

5341 cttctgcgct cggcccttcc ggctggctgg tttattgctg ataaatctgg agccggtgag 
5401 cgtgggtctc gcggtatcat tgcagcactg gggccagatg gtaagccctc ccgtatcgta 
5461 gttatctaca cgacggggag tcaggcaact atggatgaac gaaatagaca gatcgctgag 
5521 ataggtgcct cactgattaa gcattggtaa ctgtcagacc aagtttactc atatatactt 

50 5581 tagattgatt taaaacttca tttttaattt aaaaggatct aggtgaagat cctttttgat 

5641 aatctcatga ccaaaatccc ttaacgtgag ttttcgttcc actgagcgtc agaccccgta 
5701 gaaaagatca aaggatcttc ttgagatcct ttttttctgc gcgtaatctg ctgcttgcaa 
5761 acaaaaaaac caccgctacc agcggtggtt tgtttgccgg atcaagagct accaactctt 
5821 tttccgaagg taactggctt cagcagagcg cagataccaa atactgttct tctagtgtag 

55 5881 ccgtagttag gccaccactt caagaactct gtagcaccgc ctacatacct cgctctgcta 

5941 atcctgttac cagtggctgc tgccagtggc gataagtcgt gtcttaccgg gttggactca 



Q 
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6001 agacgatagt taccggataa 

6061 cccagcttgg agcgaacgac 

6121 agcgccacgc ttcccgaagg 

6181 acaggagagc gcacgaggga 

5 6241 gggtttcgcc acctctgact 

63 01 ctatggaaaa acgccagcaa 

6361 gctcacatgt tctttcctgc 

6421 cgcgtgtaaa acgacggcca 

6481 cgccgcagcc gaacgaccga 



10 

D. pLPS-Luc-EGFP 







i 


tagttattaa 


tagtaatcaa 






61 


cgttacataa 


cttacggtaa 




15 


121 


gacgtcaata 


atgacgtatg 




181 


atgggtggag 


tatttacggt 






241 


aagtacgccc 


cctattgacg 






"301 


catgacctta 


tgggactttc 






361 


catggtgatg 


cggttttggc 




20 


421 


atttccaagt 


ctccacccca 


.Mi 

.;==. 


481 


ggactttcca 


aaatgtcgta 




541 


acggtgggag 


gtctatataa 






601 


cttcgtatag 


catacattat 






661 


taaagaaagg 


cccggcgcca 


111 

\] 


25 


721 


ataaggctat 


gaagagatac 


781 


tcgaggtgaa 


catcacgtac 






841 


tgaaacgata 


tgggctgaat 






901 


aattctttat 


gccggtgttg 






961 


acatttataa 


tgaacgtgaa 


rl 


30 


1021 


ttgtttccaa 


aaaggggttg 


■rrf 


1081 


agaaaattat 


tatcatggat 


i — 
• : 

;. : 




1141 


tcgtcacatc 


tcatctacct 




1201 


atcgtgacaa 


aacaattgca 


M 




1261 


gtgtggccct 


tccgcataga 




35 


1321 


ttggcaatca 


aatcattccg 




1381 


ttggaatgtt 


tactacactc 






1441 


gatttgaaga 


agagctgttt 






1501 


tagtaccaac 


cctattttca 






1561 


ctaatttaca 


cgaaattgct 




40 


1621 


ttgcaaaacg 


cttccatctt 




1681 


cagctattct 


gattacaccc 






1741 


cattttttga 


agcgaaggtt 






1801 


gaggcgaatt 


atgtgtcaga 






1861 


cgaccaacgc 


cttgattgac 




45 


1921 


acgaagacga 


acacttcttc 




1981 


atcaggtggc 


ccccgctgaa 






2041 


cgggcgtggc 


aggtcttccc 






2101 


tggagcacgg 


aaagacgatg 






2161 


caaccgcgaa 


aaagttgcgc 




50 


2221 


ccggaaaact 


cgacgcaaga 




2281 


agtccaaatt 


gaggatccgg 






2341 


ataatcacaa 


ctagcctagg 






2401 


aaaacgcccg 


gcggcaaccg 






2461 


ctggatctat 


caacaggagt 






2521 


tcatcgcagt 


actgttgtaa 



ggcgcagcgg tcgggctgaa cggggggttc gtgcacacag 
ctacaccgaa ctgagatacc tacagcgtga gctatgagaa 
gagaaaggcg gacaggtatc cggtaagcgg cagggtcgga 
gcttccaggg ggaaacgcct ggtatcttta tagtcctgtc 
tgagcgtcga tttttgtgat gctcgtcagg ggggcggagc 
cgcggccttt ttacggttcc tggccttttg ctggcctttt 
gttatcccct gattctgtgg ataaccgtat taccgcctta 
gtagatctgt aatacgactc actatagggc gctagctgct 
gcgcagcgag tcagtgagcg aggaa (SEQ ID NO: 03) 



ttacggggtc attagttcat agcccatata tggagttccg 
atggcccgcc tggctgaccg cccaacgacc cccgcccatt 
ttcccatagt aacgccaata gggactttcc attgacgtca 
aaactgccca cttggcagta catcaagtgt atcatatgcc 
tcaatgacgg taaatggccc gcctggcatt atgcccagta 
ctacttggca gtacatctac gtattagtca tcgctattac 
agtacatcaa tgggcgtgga tagcggtttg actcacgggg 
ttgacgtcaa tgggagtttg ttttggcacc aaaatcaacg 
acaactccgc cccattgacg caaatgggcg gtaggcgtgt 
gcagagctgg tttagtgaac cgtcagatcc gctagcataa 
acgaagttat cagtcgacac catggaagac gccaaaaaca 
ttctatcctc tagaggatgg aaccgctgga gagcaactgc 
gccctggttc ctggaacaat tgcttttaca gatgcacata 
gcggaatact tcgaaatgtc cgttcggttg gcagaagcta 
acaaatcaca gaatcgtcgt atgcagtgaa aactctcttc 
ggcgcgttat ttatcggagt tgcagttgcg cccgcgaacg 
ttgctcaaca gtatgaacat ttcgcagcct accgtagtgt 
caaaaaattt tgaacgtgca aaaaaaatta ccaataattc 
tctaaaacgg attaccaggg atttcagtcg atgtacacgt 
cccggtttta atgagtacga ttttgtacca gagtcctttg 
ctgataatga attcctctgg atctactggg ttacctaagg 
actgcctgcg tcagattctc gcatgccaga gatcctattt 
gatactgcga ttttaagtgt tgttccattc catcacggtt 
ggatatttga tatgtggatt tcgagtcgtc ttaatgtata 
ttacgatccc ttcaggatta caaaattcaa agtgcgttgc 
ttcttcgcca aaagcactct gattgacaaa tacgatttat 
tctgggggcg cacctctttc gaaagaagtc ggggaagcgg 
ccagggatac gacaaggata tgggctcact gagactacat 
gagggggatg ataaaccggg cgcggtcggt aaagttgttc 
gtggatctgg ataccgggaa aacgctgggc gttaatcaga 
ggacctatga ttatgtccgg ttatgtaaac aatccggaag 
aaggatggat ggctacattc tggagacata gcttactggg 
atagttgacc gcttgaagtc tttaattaaa tacaaaggat 
ttggaatcga tattgttaca acaccccaac atcttcgacg 
gacgatgacg ccggtgaact tcccgccgcc gttgttgttt 
acggaaaaag agatcgtgga ttacgtcgcc agtcaagtaa 
ggaggagttg tgtttgtgga cgaagtaccg aaaggtctta 
aaaatcagag agatcctcat aaaggccaag aagggcggaa 
gcccaggtga gtggtcataa tcataatcat aatcataatc 
agatcctggt catgactagt gcttggattc tcaccaataa 
agcgttctga acaaatccag atggagttct gaggtcatta 
ccaagcgagc tcgatatcaa attacgcccc gccctgccac 
ttcattaagc attctgccga catggaagcc atcacaaacg 
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2581 


gcatgatgaa 


cctgaatcgc 






2641 


cccatggtga 


aaacgggggc 






2701 


gtgaaactca 


cccagggatt 






2761 


aaataggcca 


ggttttcacc 




5 


2821 


cggaaatcgt 


cgtggtattc 






2881 


acggtgtaac 


aagggtgaac 






2941 


cgaaattccg 


gatgagcatt 






3001 


ttgtgcttat 


ttttctttac 




10 


3061 


ttataggtac 


attgagcaac 




3121 


gatatatcaa 


cggtggtata 






3181 


gaaagatcca 


taacttcgta 






3241 


agcatttatc 


agggttattg 






3301 


aaacaaatag 


OT^ttccgcg 




15 


3361 


ctcaagcttc 


gaattcaggg 




3421 


ccacagctac 


cggtcgcgag 






3481 


gtcgagctgg 


acggcgacgt 






3541 


gatgccacct 


acggcaagct 






3601 


ccctggccca 


ccctcgtgac 




20 


3661 


gaccacatga 


agcagcacga 




3721 


cgcaccatct 


tcttcaagga 






3781 


ggcgacaccc 


tggtgaaccg 






3 841 


atcctggggc 


acaagctgga 


a 

r\ 




3901 


aagcagaaga 


acggcatcaa 


25 


3961 


gtgcagctcg 


ccgaccacta 


ill 


4021 


cccgacaacc 


actacctgag 


HI 




4081 


gatcacatgg 


tcctgctgga 


%i 




4141 


ctgtacaagt 


aaagcggccg 






4201 


ttttacttgc 


tttaaaaaac 


is: 


30 


4261 


caattgttgt 


tgttaacttg 


;ej:s 


4321 


tcacaaattt 


cacaaataaa 


:s 




4381 


tcatcaatgt 


atcttaaggc 






4441 


aatttttgtt 


aaatcagctc 






4501 


aaatcaaaag 


aatagaccga 




35 


4561 


ctattaaaga 


acgtggactc 


: m 

a 


4621 


ccactacgtg 


aaccatcacc 




4681 


aatcggaacc 


ctaaagggag 






4741 


gcgagaaagg 


aagggaagaa 


in 




4801 


gtcacgctgc 


gcgtaaccac 




40 


. 4861 


ggtggcactt 


ttcggggaaa 




4921 


tcaaatatgt 


atccgctcat 






4981 


aggaagagtc 


ctgaggcgga 






5041 


agtccccagg 


ctccccagca 






5101 


ccaggtgtgg 


aaagtcccca 




45 


5161 


attagtcagc 


aaccatagtc 




5221 


gttccgccca 


ttctccgccc 






5281 


ccgcctcggc 


ctctgagcta 






5341 


tttgcaaaga 


tcgatcaaga 






5401 


ttgcacgcag 


gttctccggc 




50 


5461 


cagacaatcg 


gctgctctga 




5521 


ctttttgtca 


agaccgacct 






5581 


ctatcgtggc 


tggccacgac 






5641 


gcgggaaggg 


actggctgct 






5701 


cttgctcctg 


ccgagaaagt 




55 


5761 


gatccggcta 


cctgcccatt 




5821 


cggatggaag 


ccggtcttgt 






5881 


ccagccgaac 


tgttcgccag 



cagcggcatc agcaccttgt cgccttgcgt ataatatttg 
gaagaagttg tccatattgg ccacgtttaa atcaaaactg 
ggctgagacg aaaaacatat tctcaataaa ccctttaggg 
gtaacacgcc acatcttgcg aatatatgtg tagaaactgc 
actccagagc gatgaaaacg tttcagtttg ctcatggaaa 
actatcccat atcaccagct caccgtcttt cattgccata 
catcaggcgg gcaagaatgt gaataaaggc cggataaaac 
ggtctttaaa aaggccgtaa tatccagctg aacggtctgg 
tgactgaaat gcctcaaaat gttctttacg atgccattgg 
tccagtgatt tttttctcca ttttagcttc cttagctcct 
tagcatacat tatacgaagt tatagatcca atattattga 
tctcatgagc ggatacatat ttgaatgtat ttagaaaaat 
cacatttccc cgaaaagtgc cacctgacgt ggatctcgag 
tttccttgac aatatcatac ttatcctgtc cctttttttt 
caagggcgag gagctgttca ccggggtggt gcccatcctg 
aaacggccac aagttcagcg tgtccggcga gggcgagggc 
gaccctgaag ttcatctgca ccaccggcaa gctgcccgtg 
caccctgacc tacggcgtgc agtgcttcag ccgctacccc 
cttcttcaag tccgccatgc ccgaaggcta cgtccaggag 
cgacggcaac tacaagaccc gcgccgaggt gaagttcgag 
catcgagctg aagggcatcg acttcaagga ggacggcaac 
gtacaactac aacagccaca acgtctatat catggccgac 
ggtgaacttc aagatccgcc acaacatcga ggacggcagc 
ccagcagaac acccccatcg gcgacggccc cgtgctgctg 
cacccagtcc gccctgagca aagaccccaa cgagaagcgc 
gttcgtgacc gccgccggga tcactctcgg catggacgag 
cgactctaga tcataatcag ccataccaca tttgtagagg 
ctcccacacc tccccctgaa cctgaaacat aaaatgaatg 
tttattgcag cttataatgg ttacaaataa agcaatagca 
gcattttttt cactgcattc tagttgtggt ttgtccaaac 
gtaaattgta agcgttaata ttttgttaaa attcgcgtta 
attttttaac caataggccg aaatcggcaa aatcccttat 
gatagggttg agtgttgttc cagtttggaa caagagtcca 
caacgtcaaa gggcgaaaaa ccgtctatca gggcgatggc 
ctaatcaagt tttttggggt cgaggtgccg taaagcacta 
cccccgattt agagcttgac ggggaaagcc ggcgaacgtg 
agcgaaagga gcgggcgcta gggcgctggc aagtgtagcg 
cacacccgcc gcgcttaatg cgccgctaca gggcgcgtca 
tgtgcgcgga acccctattt gtttattttt ctaaatacat 
gagacaataa ccctgataaa tgcttcaata atattgaaaa 
aagaaccagc tgtggaatgt gtgtcagtta gggtgtggaa 
ggcagaagta tgcaaagcat gcatctcaat tagtcagcaa 
ggctccccag caggcagaag tatgcaaagc atgcatctca 
ccgcccctaa ctccgcccat cccgccccta actccgccca 
catggctgac taattttttt tatttatgca gaggccgagg 
ttccagaagt agtgaggagg cttttttgga ggcctaggct 
gacaggatga ggatcgtttc gcatgattga acaagatgga 
cgcttgggtg gagaggctat tcggctatga ctgggcacaa 
tgccgccgtg ttccggctgt cagcgcaggg gcgcccggtt 
gtccggtgcc ctgaatgaac tgcaagacga ggcagcgcgg 
gggcgttcct tgcgcagctg tgctcgacgt tgtcactgaa 
attgggcgaa gtgccggggc aggatctcct gtcatctcac 
atccatcatg gctgatgcaa tgcggcggct gcatacgctt 
cgaccaccaa gcgaaacatc gcatcgagcg agcacgtact 
cgatcaggat gatctggacg aagagcatca ggggctcgcg 
gctcaaggcg agcatgcccg acggcgagga tctcgtcgtg 
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5941 acccatggcg atgcctgctt gccgaatatc 
6 001 atcgactgtg gccggctggg tgtggcggac 
6 061 gatattgctg aagagcttgg cggcgaatgg 
6121 gccgctcccg attcgcagcg catcgccttc 
6181 ggactctggg gttcgaaatg accgaccaag 
6241 attccaccgc cgccttctat gaaaggttgg 
63 01 ggatgatcct ccagcgcggg gatctcatgc 
63 61 taactgaaac acggaaggag acaataccgg 
6421 gacagaataa aacgcacggt gttgggtcgt 
6481 gctggcactc tgtcgatacc ccaccgagac 
6541 ccttttcccc accccacccc ccaagttcgg 
6601 gggcggcagg ccctgccata gcctcaggtt 
6661 ttcattttta atttaaaagg atctaggtga 
6721 tcccttaacg tgagttttcg ttccactgag 
6781 cttcttgaga tccttttttt ctgcgcgtaa 
6841 taccagcggt ggtttgtttg ccggatcaag 
6901 gcttcagcag agcgcagata ccaaatactg 
6961 acttcaagaa ctctgtagca ccgcctacat 
7021 ctgctgccag tggcgataag tcgtgtctta 
7081 ataaggcgca gcggtcgggc tgaacggggg 
7141 cgacctacac cgaactgaga tacctacagc 
7201 aagggagaaa ggcggacagg tatccggtaa 
7261 gggagcttcc agggggaaac gcctggtatc 
7321 gacttgagcg tcgatttttg tgatgctcgt 
73 81 gcaacgcggc ctttttacgg ttcctggcct 
7441 ctgcgttatc ccctgattct gtggataacc 



atggtggaaa atggccgctt ttctggattc 
cgctatcagg acatagcgtt ggctacccgt 
gctgaccgct tcctcgtgct ttacggtatc 
tatcgccttc ttgacgagtt cttctgagcg 
cgacgcccaa cctgccatca cgagatttcg 
gcttcggaat cgttttccgg gacgccggct 
tggagttctt cgcccaccct agggggaggc 
aaggaacccg cgctatgacg gcaataaaaa 
ttgttcataa acgcggggtt cggtcccagg 
cccattgggg ccaatacgcc cgcgtttctt 
gtgaaggccc agggctcgca gccaacgtcg 
actcatatat actttagatt gatttaaaac 
agatcctttt tgataatctc atgaccaaaa 
cgtcagaccc cgtagaaaag atcaaaggat 
tctgctgctt gcaaacaaaa aaaccaccgc 
agctaccaac tctttttccg aaggtaactg 
tccttctagt gtagccgtag ttaggccacc 
acctcgctct gctaatcctg ttaccagtgg 
ccgggttgga ctcaagacga tagttaccgg 
gttcgtgcac acagcccagc ttggagcgaa 
gtgagctatg agaaagcgcc acgcttcccg 
gcggcagggt cggaacagga gagcgcacga 
tttatagtcc tgtcgggttt cgccacctct 
caggggggcg gagcctatgg aaaaacgcca 
tttgctggcc ttttgctcac atgttctttc 
gtattaccgc catgcat (SEQ ID NO: 04) 



Example 3. Representative Splice Donor and Acceptor Sites 

A. Consensus Splice Donor and Acceptor oligos: 

Consensus splice donor: 

(cloned into pDNR-1 at Apal and Avrll sites) 

Site of Exon/intron boundary _J 

top : CAGGTGAGTTAGGTAAGTGAACATGGTCATAGCTGTTTC 
bottom: CCGGGTCCACTCAATCCATTCACTTGTACCAGTATCGACAAAGGATC 

(SEQ ID NOS: 05 & 06) 

Consensus splice acceptor (includes branch site): 
(cloned into pEGFP-N1 at EcoRI and Agel sites) 

Site of Exon/intron boundary J_ 

top : AATTCAGGGTTTCCTTGACAATATCATACTTATCCTGTCCCTTTTTTTTCCACAGCTA 
bottom: GTCCCAAAGGAACTGTTATAGTATGAATAGGACAGGGAAAAAAAAGGTGTCGATGGCC 

(SEQ ID NOS:07 & 08) 

B. Splice donor from Human hemoglobin Beta 
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Sequence encoding exon and intron sequence flanking the start of Human 
Hemoglobin Beta intron I: 



ij] 



Site of Exon/intron boundary _J 

5 top : AGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCAAGGTTACAAGACAGGT 
bottom: TCAACCACCACTCCGGGACCCGTCCAACCATAGTTCCAATGTTCTGTCCA 

(SEQIDNOS: 09&10) 

10 This splice donor sequence was encoded within the following oligo to enable 
cloning into pDNR-1 at the Apal and Avrll sites. Note that this oligo was 
additionally designed to place stop codons (TAG and TAA) in the two unused 
reading frames present in the MCS of pDNR-1 . (The frame utilized is defined as 
starting with the first base of the loxP site in pDNR-1). In addition, remaining in 

15 frame with the utilized frame is encoded an (HN)6 tag to enable protein 
purification in bacteria - this is encoded directly after the intron seq shown 
above. 

Oligo for Splice Donor from Human Hemoglobin Intron I with added Stops and 
20 (HN)6 tag: 



Site of Exon/intron boundary 

s? s Top : 



.n 25 



30 



40 



CGTAGTGTAAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCAAGGTTACAAGACAGGTCATAATCATAATCATAATCATAATCATAATCACAACTAGC 
Bottom : 

CCGGGCATCACATTTCAACCACCACTCCGGGACCCGTCCAACCATAGTTCCAATGTTCTGTCCAGTATTAGTATTAGTATTAGTATTAGTATTAGTGTTG 
ATCGGATC 



(SEQIDNOS:11&12) 

Sequence for (HN)6 tag within Splice donor oligo: 



W Top : GGT CAT AAT CAT AAT CAT AAT CAT AAT CAT AAT CAC AAC TAG 

ry Bottom: CCA GTA TTA GTA TTA GTA TTA GTA TTA GTA TTA GTG TTG ATC 

35 Peptide encoded: Gly His Asn His Asn His Asn His Asn His Asn His Asn stop 

(SEQ IDNOS:13, 14 17 15) 



Splice acceptor from Human hemoglobin Beta 



This oligo encodes the splice Acceptor region of intron I from Human Hemoglobin 
Beta together with flanking exoon sequence. It was cloned into pEGFP-N1 at the 
45 Agel and EcoR I sites. 

Oligo for Human Hemoglobin Beta splice acceptor from Intron I: 
Site of Exon/intron boundary 
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a 



Hi 



ill 



Top : 

AATTCTTGGGTTTCTGATAGGCACTGACTCTCTCTGCCGATTGGTCTATTTTCCCACCCTTAGGCTGCTGGTGGTCTACC 

CTTGGACCCTA 

Bottom: 

5 GAAC C CAAAGACTATC CGTGACTGAGAGAGACGGCTAACCAGATAAAAGGGTGGGAATC CGACGAC CAC CAGATGGGAAC 
CTGGGATGGCC 

(SEQIDNOS:16&17) 

It is evident from the above results and discussion that the subject 
10 invention provides an efficient method to transfer a nucleic acid from a first vector 
to a second vector, where the subject methods do not employ digestion and 
ligation protocols. Advantages provided by the subject invention include: the 
ability to transfer or clone a nucleic acid of interest from a single donor into a 
variety of different expression vectors at substantially the same time and in a 
15 known orientation and reading frame; the ability to readily identify successful 
clones; the ability to transfer many different genes to one or more expression 
vectors simultaneously; no longer needing to sequence the junctions of the 
transferred fragment and the expression vector or to resequence the gene 
transferred and the like. Another advantage of the subject invention is to provide 
20 for introns in the product vector, so as to remove any unwanted sequences from 
the final encoded product, and/or easily produce N- and/or C-terminal tagged 
fusion proteins. As such, the subject invention represents a significant 
contribution to the art. 

25 All publications and patent applications cited in this specification are herein 

incorporated by reference as if each individual publication or patent application 
were specifically and individually indicated to be incorporated by reference. The 
citation of any publication is for its disclosure prior to the filing date and should not 
be construed as an admission that the present invention is not entitled to 

30 antedate such publication by virtue of prior invention. 

Although the foregoing invention has been described in some detail by way 
of illustration and example for purposes of clarity of understanding, it is readily 
apparent to those of ordinary skill in the art in light of the teachings of this 
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invention that certain changes and modifications may be made thereto without 
departing from the spirit or scope of the appended claims. 
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